diff --git a/_index.db b/_index.db
index 6467fdb78..22788dcd9 100644
Binary files a/_index.db and b/_index.db differ
diff --git a/data/en.wikipedia.org/wiki/Absolute_dating-0.md b/data/en.wikipedia.org/wiki/Absolute_dating-0.md
new file mode 100644
index 000000000..42f6b0d7a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Absolute_dating-0.md
@@ -0,0 +1,41 @@
+---
+title: "Absolute dating"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Absolute_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:25.571074+00:00"
+instance: "kb-cron"
+---
+
+Absolute dating is the process of determining an age on a specified chronology in archaeology and geology. Absolute dating provides a numerical age or range, in contrast with relative dating, which places events in order without any measure of the age between events. Some scientists prefer the terms chronometric dating or calendar dating, as the use of the word "absolute" may imply an unwarranted certainty of accuracy.  
+In archaeology, absolute dating is usually based on the physical, chemical, and life properties of the materials of artifacts, buildings, or other items that have been modified by humans and by historical associations with materials with known dates (such as coins and historical records). For example, coins found in excavations may have their production date written on them, or there may be written records describing the coin and when it was used, allowing the site to be associated with a particular calendar year. Absolute dating techniques include radiocarbon dating of wood or bones, potassium-argon dating, and trapped-charge dating methods such as thermoluminescence dating of glazed ceramics.
+In historical geology, the primary methods of absolute dating involve using the radioactive decay of elements trapped in rocks or minerals, including isotope systems from younger organic remains (radiocarbon dating with 14C) to systems such as uranium–lead dating that allow determination of absolute ages for some of the oldest rocks on Earth.
+
+== Radiometric techniques ==
+
+Radiometric dating is based on the known and constant rate of decay of radioactive isotopes into their radiogenic daughter isotopes. Particular isotopes are suitable for different applications due to the types of atoms present in the mineral or other material and its approximate age. For example, techniques based on isotopes with half-lives in the thousands of years, such as carbon-14, cannot be used to date materials that have ages on the order of billions of years, as the detectable amounts of the radioactive atoms and their decayed daughter isotopes will be too small to measure within the uncertainty of the instruments.
+
+=== Radiocarbon dating ===
+
+One of the most widely used and well-known absolute dating techniques is carbon-14 (or radiocarbon) dating, which is used to date organic remains. This is a radiometric technique since it is based on radioactive decay. Cosmic radiation entering Earth's atmosphere produces carbon-14, and plants take in carbon-14 as they fix carbon dioxide. Carbon-14 moves up the food chain as animals eat plants and as predators eat other animals. With death, the uptake of carbon-14 stops.
+It takes 5,730 years for half the carbon-14 to decay to nitrogen; this is the half-life of carbon-14. After another 5,730 years, only one-quarter of the original carbon-14 will remain.  After yet another 5,730 years, only one-eighth will be left.
+By measuring the carbon-14 in organic material, scientists can determine the date of death of the organic matter in an artifact or ecofact.
+
+==== Limitations ====
+The relatively short half-life of carbon-14, 5,730 years, makes dating reliable only up to about 60,000 years. The technique often cannot pinpoint the date of an archeological site better than historic records, but is highly effective for precise dates when calibrated with other dating techniques such as tree-ring dating.
+An additional problem with carbon-14 dates from archeological sites is known as the "old wood" problem. In dry, desert climates, organic materials like dead trees can remain in their natural state for hundreds of years. When people eventually use these materials as firewood or building supplies, they become part of the archaeological record. Thus, dating that particular tree does not necessarily indicate when the fire burned or the structure was built.
+For this reason, many archaeologists prefer to use samples from short-lived plants for radiocarbon dating. The development of accelerator mass spectrometry (AMS) dating, which allows a date to be obtained from a very small sample, has been very useful in this regard.
+
+=== Potassium-argon dating ===
+
+Other radiometric dating techniques are available for earlier periods.  One of the most widely used is potassium–argon dating (K–Ar dating).  Potassium-40 is a radioactive isotope of potassium that decays into argon-40.  The half-life of potassium-40 is 1.3 billion years, far longer than that of carbon-14, allowing much older samples to be dated. Potassium is common in rocks and minerals, allowing many samples of geochronological or archeological interest to be dated.
+Argon, a noble gas, is not commonly incorporated into such samples except when produced in situ through radioactive decay. The date measured reveals the last time that the object was heated past the closure temperature at which the trapped argon can escape the lattice. K–Ar dating was used to calibrate the geomagnetic polarity time scale.
+
+== Luminescence dating ==
+
+=== Thermoluminescence ===
+Thermoluminescence dating also dates items to the last time they were heated. This technique is based on the principle that all objects absorb radiation from the environment.  This process frees electrons within minerals that remain caught within the item.
+Heating an item to 500 degrees Celsius or higher releases the trapped electrons, producing light.  This light can be measured to determine the last time the item was heated.
+Radiation levels do not remain constant over time. Fluctuating levels can skew results – for example, if an item went through several high radiation eras, thermoluminescence will return an older date for the item.  Many factors can spoil the sample before testing as well, exposing the sample to heat or direct light may cause some of the electrons to dissipate, causing the item to date younger.
+Because of these and other factors, Thermoluminescence is at most about 15% accurate.  It cannot be used to accurately date a site on its own.  However, it can be used to confirm the antiquity of an item.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Absolute_dating-1.md b/data/en.wikipedia.org/wiki/Absolute_dating-1.md
new file mode 100644
index 000000000..e6cf53ece
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Absolute_dating-1.md
@@ -0,0 +1,50 @@
+---
+title: "Absolute dating"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Absolute_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:25.571074+00:00"
+instance: "kb-cron"
+---
+
+=== Optically stimulated luminescence (OSL) ===
+Optically stimulated luminescence (OSL) dating constrains the time at which sediment was last exposed to light. During sediment transport, exposure to sunlight 'zeros' the luminescence signal. Upon burial, the sediment accumulates a luminescence signal as natural ambient radiation gradually ionises the mineral grains.
+Careful sampling under dark conditions allows the sediment to be exposed to artificial light in the laboratory, which releases the OSL signal. The amount of luminescence released is used to calculate the equivalent dose (De) that the sediment has acquired since deposition, which can be used in combination with the dose rate (Dr) to calculate the age.
+
+== Dendrochronology ==
+
+Dendrochronology, or tree-ring dating, is the scientific method of dating based on the analysis of patterns of tree rings, also known as growth rings. Dendrochronology can date the time at which tree rings were formed, in many types of wood, to the exact calendar year.
+Dendrochronology has three main areas of application: paleoecology, where it is used to determine certain aspects of past ecologies (most prominently climate); archaeology, where it is used to date old buildings, etc.; and radiocarbon dating, where it is used to calibrate radiocarbon ages (see below).
+In some areas of the world, it is possible to date wood back a few thousand years, or even many thousands.  Currently, the maximum for fully anchored chronologies is a little over 11,000 years from present.
+
+== Amino acid dating ==
+
+Amino acid dating is a dating technique used to estimate the age of a specimen in paleobiology, archaeology, forensic science, taphonomy, sedimentary geology and other fields.  This technique relates changes in amino acid molecules to the time elapsed since they were formed. All biological tissues contain amino acids.  All amino acids except glycine (the simplest one) are optically active, having an asymmetric carbon atom. This means that the amino acid can have two different configurations, "D" or "L" which are mirror images of each other.
+With a few important exceptions, living organisms keep all their amino acids in the "L" configuration.  When an organism dies, control over the configuration of the amino acids ceases, and the ratio of D to L moves from a value near 0 towards an equilibrium value near 1, a process called racemization.  Thus, measuring the ratio of D to L in a sample enables one to estimate how long ago the specimen died.
+
+== See also ==
+Astronomical chronology
+Age of the Earth
+Age of the universe
+Chronological dating, archaeological chronology
+Absolute dating, this article
+Relative dating
+Phase (archaeology)
+Archaeological association
+Geochronology
+Chronostratigraphy
+Future of the Earth
+Geologic time scale
+Geological history of Earth
+Plate reconstruction
+Plate tectonics
+Thermochronology
+Timeline of natural history
+List of geochronologic names
+General
+Consilience, evidence from independent, unrelated sources can "converge" on strong conclusions
+
+== References ==
+
+== Further reading ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Acali-0.md b/data/en.wikipedia.org/wiki/Acali-0.md
new file mode 100644
index 000000000..8ec8df7c3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Acali-0.md
@@ -0,0 +1,32 @@
+---
+title: "Acali"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Acali"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:28.400557+00:00"
+instance: "kb-cron"
+---
+
+The Acali expedition (or Acali experiment or the Sex Raft) was a 1973 social experiment that aimed to investigate interpersonal relationships in conditions of limited space and social isolation. The experiment was conceived by Mexican anthropologist Santiago Genovés, who had previously been a crew member of Thor Heyerdahl's Ra expedition. The participants showed a restraint towards aggression, which frustrated Genovés and led him to start to try to create conflict, and at one point he took command of the float. Despite these attempts, the group remained peaceful.
+The raft had a complement of eleven people: five men and six women. It left Las Palmas, Spain, on 12 May 1973 and took 101 days to drift across the Atlantic Ocean and reach Cozumel, Mexico, with a single stopover in Barbados. Frequently dubbed the "Sex Raft" by the media, it was the subject of a 2018 documentary film The Raft, by Marcus Lindeen.
+
+
+== The Raft ==
+The name of the raft, Acali, comes from the Nahuatl language and means "the house on the water".
+The raft was built specifically for the experiment. It had a steel hull and dimensions of 12 by 7 metres. The cabin measured 4 × 4 metres. It was designed by José Antonio Mandri and Colin Mudie, and it was built in Newcastle upon Tyne, England.
+
+
+== Participants ==
+
+
+== See also ==
+Stanford prison experiment (1971)
+
+
+== References ==
+
+
+== External links ==
+Raft of Passion - Episode of Snap Judgement in which Mary Gidley recounts her experience with the Acali experiment
+The Raft - Documentary film
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Acanthochronology-0.md b/data/en.wikipedia.org/wiki/Acanthochronology-0.md
new file mode 100644
index 000000000..2619ee7b1
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Acanthochronology-0.md
@@ -0,0 +1,23 @@
+---
+title: "Acanthochronology"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Acanthochronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:26.759970+00:00"
+instance: "kb-cron"
+---
+
+Acanthochronology is the study of cactus spines or Euphorbia thorns grown in time ordered sequence (i.e. in series). Physical, morphological or chemical characteristics and information about the relative order or absolute age of the spines or thorns is used to study past climate or plant physiology.
+
+For example, columnar cactus spines grow from the apex of the plant. After several weeks the spines stop growing and have been moved to the side of the stem. The old spines remain in place for decades as new spines are created at the continually growing apex. The result is that along each external "rib" of the cactus is a series of spines arranged in the order they grew in – the oldest spines are at the bottom and the youngest spines are at the top. These spines can be dated using bomb-spike Carbon-14 and isotopes of carbon (Carbon-13) and oxygen (Oxygen-18) may be used to infer past climate (e.g. precipitation or temperature), plant stem growth or plant physiology (e.g. photosynthetic processes). Alternatively, the width of small transverse bands in the spine may be used to infer daily information about cloud cover or plant productivity, although this remains to be tested. It has also been shown that regular waxy banding on the sides of a Costa Rican cactus (Lemaireocereus aragonii) indicate annual growth and can be used as temporal chronometers. 
+
+This sub-discipline of paleoclimatology and ecophysiology is relatively new. Acanthochronology is closely related to dendrochronology, dendroclimatology and isotope geochemistry and borrows many of the methods and techniques from these sub-disciplines of the Earth Sciences. It also draws heavily from the field of ecophysiology, a branch of Biology, to ascribe spine or thorn characteristics to particular environmental or physiological variables.
+The first peer-reviewed article to present and explain an isotope spine series was from a saguaro cactus in Tucson, Arizona. This and other work shows that radiocarbon and isotope time-series derived from spines can be used for demographic or palaeoclimate studies.
+
+
+== References ==
+
+
+== Further reading ==
+Doménech-Carbó, Antonio (2015). "Dating: An analytical task". Chemtexts. 1. doi:10.1007/s40828-014-0005-6.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion-0.md b/data/en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion-0.md
new file mode 100644
index 000000000..bba415801
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion-0.md
@@ -0,0 +1,37 @@
+---
+title: "Aircraft Nuclear Propulsion"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:29.596684+00:00"
+instance: "kb-cron"
+---
+
+The Aircraft Nuclear Propulsion (ANP) program and the preceding Nuclear Energy for the Propulsion of Aircraft (NEPA) project worked to develop a nuclear propulsion system for aircraft. The United States Army Air Forces initiated Project NEPA on May 28, 1946. NEPA operated until May 1951, when the project was transferred to the joint Atomic Energy Commission (AEC)/USAF ANP. The USAF pursued two different systems for nuclear-powered jet engines, the Direct Air Cycle concept, which was developed by General Electric, and Indirect Air Cycle, which was assigned to Pratt & Whitney.  The program was intended to develop and test the Convair X-6, but was canceled in 1961 before that aircraft was built. The total cost of the program from 1946 to 1961 was about $1 billion.
+
+== Types ==
+
+=== Direct air cycle ===
+
+Direct cycle nuclear engines resemble a conventional jet engine without combustion chambers. The hot compressed air produced by the compressor section is instead directed into the nuclear reactor core. The air is heated further, thereby cooling the reactor. The air is then expanded through a turbine, powering the compressor, before it is exhausted at high velocity to provide thrust. The end result is that instead of using jet fuel, an aircraft could rely on the heat from nuclear reactions.
+The General Electric program, which was based at Evendale, Ohio, was pursued because of its advantages in simplicity, reliability, suitability and quick start ability. Conventional jet engine compressor and turbine sections were used, with the compressed air run through the reactor to be heated by it before being exhausted through the turbine.
+
+=== Indirect air cycle ===
+Indirect cycling involves thermal exchange outside of the core with compressor air being sent to a heat exchanger. The nuclear reactor core would heat up pressurized water or liquid metal and send it to the heat exchanger as well. That hot liquid would be cooled by the air; the air would be heated by the liquid, sent through a turbine (powering the compressor), then out the exhaust, providing thrust.
+The indirect air cycle program was assigned to Pratt & Whitney, at a facility near Middletown, Connecticut. This concept would have produced far less radioactive pollution. One or two loops of liquid metal would carry the heat from the reactor to the engine. This program involved a great deal of research and development of many light-weight systems suitable for use in aircraft, such as heat exchangers, liquid-metal turbopumps and radiators. The Indirect Cycle program never came anywhere near producing flight-ready hardware.
+
+== Experimental reactors and projects ==
+
+=== Aircraft Reactor Experiment ===
+
+The United States Aircraft Reactor Experiment (ARE) was a 2.5 MWth thermal-spectrum nuclear reactor experiment designed to attain a high power density and high output temperature for use as an engine in a nuclear-powered bomber aircraft. The advantage of a nuclear-powered aircraft over a conventionally-powered aircraft is that it could remain airborne orders of magnitude longer and provide an effective nuclear strategic deterrent to a nuclear-armed Soviet adversary. The ARE was the first molten salt reactor (MSR) to be built and operated. It used the molten fluoride salt NaF–ZrF4–UF4 (53–41–6 mol%) as fuel, was moderated by a hexagonal-configuration beryllium oxide (BeO), and had a peak temperature of 860 °C. A redundant liquid sodium coolant system was used to cool the moderator and reflector materials. A secondary helium gas coolant loop was circulated around the primary coolant to transfer heat to a water radiator where heat output was dumped to atmosphere. Reactivity control rods were installed and it was found that the control rods did not determine the output power of the ARE; rather, the power demand did, which affected the outlet and inlet temperatures because of the negative temperature coefficient of reactivity. The ARE was operated at power for 221 hours up to a peak of 2.5 MWth.
+
+=== MX-1589 project ===
+
+On September 5, 1951, the USAF awarded Convair a contract to fly a nuclear reactor on board a modified Convair B-36 Peacemaker under the MX-1589 project of the ANP program. The NB-36H Nuclear Test Aircraft (NTA) was to study shielding requirements for an airborne reactor, to determine whether a nuclear aircraft was feasible. This was the only known airborne reactor experiment by the U.S. with an operational nuclear reactor on board. The NTA flew a total of 47 times testing the reactor over West Texas and Southern New Mexico. The reactor, named the Aircraft Shield Test Reactor (ASTR), was operational but did not power the aircraft; the primary purpose of the flight program was testing the effectiveness of the shielding. Based on the results of the NTA, the X-6 and the entire nuclear aircraft program was abandoned in 1961.
+
+=== Heat Transfer Reactor Experiments ===
+
+As part of the AEC/USAF ANP program, in 1956 modified General Electric J47s were first operated on nuclear power using a reactor test assembly known as Heat Transfer Reactor Experiment 1 (HTRE-1). HTRE-1, which used vertically-oriented control rods, was reconfigured with a removable core to become HTRE-2 for additional testing. HTRE-3 was built separately to test horizontally-oriented control rods as appropriate for use in an airframe.
+The decommissioned HTRE-2 and HTRE-3 reactors and test assemblies can be viewed by the public in the Experimental Breeder Reactor I parking lot at Idaho National Laboratory.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion-1.md b/data/en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion-1.md
new file mode 100644
index 000000000..3a6614971
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion-1.md
@@ -0,0 +1,40 @@
+---
+title: "Aircraft Nuclear Propulsion"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Aircraft_Nuclear_Propulsion"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:29.596684+00:00"
+instance: "kb-cron"
+---
+
+=== Pratt & Whitney Aircraft Reactor-1 ===
+On February 5, 1957, another reactor was made critical at the Critical Experiments Facility of the Oak Ridge National Laboratory (ORNL) as part of the circulating-fuel reactor program of the Pratt & Whitney Aircraft Company (PWAC). This was called the PWAR-1, the Pratt & Whitney Aircraft Reactor-1. The purpose of the experiment was to experimentally verify the theoretically predicted nuclear properties of a PWAC reactor. The experiment was only run briefly; by the end of February 1957 all data had been taken and disassembly had begun. The experiment was run at essentially zero nuclear power. The operating temperature was held constant at approximately 675 °C (1,247 °F), which corresponds closely to the design operating temperature of the PWAR-l moderator; this temperature was maintained by external heaters. Like the 2.5 MWt ARE, the PWAR-1 used NaF-ZrF4-UF4 as the primary fuel and coolant.
+
+== Cancellation ==
+Technological competition with the Soviet Union (as represented by the launch of Sputnik 1), and continued strong support from the Air Force allowed the program to continue, despite divided leadership between the DOD and the AEC. Numerous test facilities were funded and constructed through the 1950s and 1960–61 in order to produce a flight-worthy nuclear power unit, including one at the Oak Ridge National Laboratory (ORNL). While the ARE successfully demonstrated operation of a MSR concept, the program was canceled by President Kennedy on March 26, 1961, citing the high cost with no flight-worthy reactor having been produced up to that point – "15 years and about $1 billion have been devoted to the attempted development of a nuclear-powered aircraft; but the possibility of achieving a militarily useful aircraft in the foreseeable future is still very remote". Also contributing to the cancellation was that the first intercontinental ballistic missiles entered into active service in September 1959 which all but eliminated the need for a nuclear-powered aircraft as a strategic deterrent. Nevertheless, the results of the ARE program prompted scientists and engineers at ORNL to submit a preliminary design proposal to the Atomic Energy Commission for a 30 MWth experimental MSR to explore MSR as a civilian power station concept. The result of the proposal was direction from the Atomic Energy Commission for ORNL to design, construct, and operate the Molten-Salt Reactor Experiment (MSRE).
+
+== See also ==
+List of nuclear-powered aircraft
+Georgia Nuclear Aircraft Laboratory
+WS-125, 1955 USAF requirement for nuclear powered bomber
+NERVA
+Project Pluto to develop nuclear powered ramjet engines for use in cruise missiles
+Project Rover to develop a nuclear thermal rocket
+Tupolev Tu-95LAL
+
+== References ==
+
+== External links ==
+
+Gantz, Kenneth (1960), Nuclear flight; the United States Air Force programs for atomic jets, missiles, and rockets., New York, Duell, Sloan and Pearce.
+Thorton, G (June 28, 1962), Comprehensive Technical Report, General Electric Direct-Air-Cycle Aircraft Nuclear Propulsion Program, Program Summary and References, US Atomic Energy Commission (AEC), OSTI 1048124.
+Dreams of Nuclear Flight — The NEPA and ANP programs (PDF), Wisc, archived from the original (PDF) on June 18, 2010, retrieved August 12, 2009.
+The Bureau of Atomic Tourism, archived, archived from the original on December 31, 2010.
+Wendt, Gerald (1951), A Scientist Preview The First Atomic Airplane (article) with illustrations on the subject of using an atomic reactor to power an aircraft.
+Martin, Richard (May 8, 2012), "ANP", SuperFuel, St. Martin's Publishing, pp. 109–12, ISBN 9780230341913.
+SOVİET TOP SECRET NUCLEAR AIRPLANE M-60 Akademi Portal by Akademi Portal web site (in English)
+COMPREHENSHIVE TECHNICAL REPORT GE DIRECT AIR CYCLE AIRCRAFT NUCLEAR PROPULSION PROGRAM (in English)
+"Flyable" Reactors & Neutron Coupling (in English)
+Aircraft Nuclear Propulsion Program: Hearing before the Subcommittee on Research and Development of the Joint Committee on Atomic Energy (Report). July 23, 1959. hdl:2027/uiug.30112065524198.
+Declassified Aircraft Nuclear Propulsion Program: Manned Aircraft Progress Report 1956-1958
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Amino_acid_dating-0.md b/data/en.wikipedia.org/wiki/Amino_acid_dating-0.md
new file mode 100644
index 000000000..2f4438c06
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Amino_acid_dating-0.md
@@ -0,0 +1,58 @@
+---
+title: "Amino acid dating"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Amino_acid_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:27.976616+00:00"
+instance: "kb-cron"
+---
+
+Amino acid dating or racemization dating is a dating technique used to estimate the age of a specimen in paleobiology, molecular paleontology, archaeology, forensic science, taphonomy, sedimentary geology and other fields.  This technique relates changes in amino acid molecules to the time elapsed since they were formed.
+
+
+== Background ==
+
+
+=== Chemistry ===
+
+Amino acids are a set of organic compounds that are used by living organisms to synthesise proteins. All amino acids (except glycine) have one or more pairs of stereoisomers, isomers which share the same bond order but are organized differently in 3D space. Amino acid stereoisomer pairs that are optically active and non-superimposable mirror images of each other are enantiomers; pairs that are not mirror images are diastereomers or epimers. Biological systems are stereoselective, preferring certain stereoisomers for chemical reactions; living organisms keep all their amino acids in their "left-handed" (L or levo-) forms (a state called homochirality) because they are unable to use the "right-handed" (D or dextro-) forms for protein synthesis. This ratio of D and L forms is unstable, as the molecules may undergo reactions (known as racemization or epimerization respective to the type of stereoisomer pair involved) and become the other stereoisomer.
+When an organism becomes unable to keep its amino acids in that unbalanced ratio, such as by dying or shedding tissue, the system will proceed towards chemical equilibrium. Measuring the progress of this interconversion reaction
+allows estimation of an organism's time of death, if environmental variables like moisture and temperature are accounted for.
+
+
+=== Amino acids and environmental conditions ===
+Amino acids commonly used for amino acid dating analysis are leucine, aspartic acid, valine, glutamic acid, and diastereomer isoleucine. 
+The properties of the amino acid(s) chosen for analysis influence what kind of dating can be performed. Amino acid interconversion reactions happen at a variety of speeds: aspartic acid racemizes very quickly and hence is used for recent samples where high resolution is important, while  valine and leucine take much longer to racemize and are more appropriate for older fossils. Additionally, these reaction rates are sensitive to temperature, to a degree depending on the specific interconversion reaction. The racemization rate of aspartic acid varies with small changes in temperature, while valine's racemization rate is less temperature dependent. 
+Besides higher temperatures accelerating interconversion, other environmental variables also impact reaction rates. Wetter environments produce faster reaction rates, and interconversion reactions may be catalyzed by the presence of acids, bases, or metal cations. The chosen host organisms or taxa also introduce bias into age estimates. 
+Amino acids which are bound within peptides interconvert more slowly than those which are free or are occupying the terminal position of peptide chains. The degree of hydrolysis of peptides (and therefore the speed at which equilibrium approaches) increases with fossil age.
+
+
+== Applications ==
+Amino acid dating has applications in archaeology, stratigraphy, oceanography, paleogeography, paleobiology, and paleoclimatology. These include dating correlation, relative dating, sedimentation rate analysis, sediment transport studies, conservation paleobiology, taphonomy and time-averaging,sea level determinations, and thermal history reconstructions.
+Amino acid dating may be used to date samples too old for radiocarbon dating (which has a maximum range of 40 ka to 0 ka), or too young for potassium-argon dating (which has a range of 40 ka to 150 ka) to be helpful. Verification of radiocarbon and other dating techniques by comparison with amino acid dating is also possible. The 'filling in' of large probability ranges, such as those caused by variation in 14C levels throughout the biosphere, has sometimes been possible as well.
+Bone, shell, and sediment studies have contributed much to the paleontological record, including that relating to hominoids.  Many studies have been undertaken in paleopathology and dietary selection, paleozoogeography and indigeneity, taxonomy and taphonomy, and DNA viability. Human cultural changes and their effects on local ecologies have been assessed using this technique; the differentiation of cooked from uncooked bone, shell, and residue is sometimes possible.
+Amino acid racemization also has a role in tissue and protein degradation studies, particularly useful in developing museum preservation methods. These studies have produced models of protein adhesive and other biopolymer deteriorations and the concurrent pore system development. The reduction in bodily repair capability during aging is important to studies of senescence and age-associated disease, and allows the determination of age in living animals.
+Forensic science can use this technique to estimate the age of a cadaver or an objet d'art to determine authenticity.
+
+
+== Methods ==
+Amino acid racemization analysis consists of sample preparation, isolation of the amino acid wanted, and measure of its D:L ratio. Sample preparation entails the identification, raw extraction, and separation of proteins into their constituent amino acids, typically by grinding followed by acid hydrolysis. The amino acid derivative hydrolysis product can be combined with a chiral specific fluorescent, separated by chromatography or electrophoresis, and the particular amino acid D:L ratio determined by fluorescence. Alternatively, the particular amino acid can be separated by chromatography or electrophoresis, combined with a metal cation, and the D:L ratio determined by mass spectrometry.
+Conventional racemization analysis tends to report a D-alloisoleucine / L-isoleucine (A/I or D/L ratio). This amino acid ratio has the advantages of being relatively easy to measure and being chronologically useful through the Quaternary.
+Reversed phase HPLC techniques can measure up to 9 amino acids useful in geochronology over different time scales on a single chromatogram (aspartic acid, glutamic acid, serine, alanine, arginine, tyrosine, valine, phenylalanine, leucine).
+Amino acid dating relies on the assumption that the fraction of amino acids being studied has been a closed system since its formation, exchanging nothing with its surroundings. Removing contaminants decreases variability in results by ensuring that analysis is performed only on the most representative fraction of amino acids. These cleaning methods may include soaking powdered biomineral samples in bleach prior to measuring D/L ratio, destroying the amino acids in the more porous, open areas while leaving the fraction trapped inside the grains unscathed.
+
+
+== References ==
+
+
+== External links ==
+
+
+=== Active laboratories ===
+Northern Arizona University Amino Acid Geochronology Laboratory Archived 2017-03-31 at the Wayback Machine
+University of Massachusetts Amino Acid Geochronology Laboratory
+The University of Colorado Amino Acid Geochronology Lab
+University of Delaware Research Group
+University of York BioArCh
+Madrid School of Mines Biomolecular Stratigraphy Laboratory
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Anecdotal_evidence-0.md b/data/en.wikipedia.org/wiki/Anecdotal_evidence-0.md
index efb014538..e04667585 100644
--- a/data/en.wikipedia.org/wiki/Anecdotal_evidence-0.md
+++ b/data/en.wikipedia.org/wiki/Anecdotal_evidence-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Anecdotal_evidence"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:16:52.333239+00:00"
+date_saved: "2026-05-05T09:55:50.761018+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Appearance_event_ordination-0.md b/data/en.wikipedia.org/wiki/Appearance_event_ordination-0.md
new file mode 100644
index 000000000..dd7b70167
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Appearance_event_ordination-0.md
@@ -0,0 +1,39 @@
+---
+title: "Appearance event ordination"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Appearance_event_ordination"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:29.143585+00:00"
+instance: "kb-cron"
+---
+
+Appearance event ordination or AEO is a scientific method for biochronology through the ordering of the appearance of fossil mammal genera by multivariate analysis, using conjunctional (overlapping) and disconjunctional (nonoverlapping) range distributions in large sets of data.
+
+
+== Process ==
+AEO is based on faunal overlap and stratigraphic superposition to derive a best-fit sequence of first and last appearance events.
+
+
+=== Step 1 ===
+The first step is to translate patterns of overlap and superposition into pairwise first-before-last statements. The wolf species Canis edwardii and Canis armbrusteri are used as example taxa for the following patterns. Each statement means C. edwardii, for example, must have first appeared before C. armbusteri last appeared. This is true whenever either (1) C. edwardii and C. armbrusteri have been found together in at least one nontime-averaged fossil collection, or (2) C. edwardii is found lower in at least one lithostratigraphic section than C. armbrusteri.
+
+
+=== Step 2 ===
+A multivariate ordination algorithm is applied to derive a first-pass, hypothesized sequence of first and last appearances. The minimal constraint on this sequence is that if there is an observed, real-world C. edwardii before C. armbrusteri statement for any pair of taxa, the hypothesized event sequence must replicate it. Then, the program shuffles the events using a maximum likelihood criterion. The criterion basically seeks to pull apart as many hypothesized age range overlaps as possible, especially if they involve common taxa. Taxa are defined as "common" if they are known to overlap with a large fraction of the taxa with which they are implied to overlap.
+
+
+=== Step 3 ===
+Once the relative event sequence has been established, it is converted into numerical time with a nonlinear interpolation algorithm that compares event sequence positions and geochronological age estimates for collections that have them. The calibration only uses: 
+
+40Argon/39Argon dates
+Uranium-thorium dates for some Pleistocene collections
+Paleomagnetic dates that derive from unambiguous, narrow correlations inferred using nonfaunal tie points such as the position in the section of the Cretaceous–Paleogene boundary, Paleocene-Eocene boundary, or Recent
+
+
+== NALMA vs. AEO ==
+The North American land mammal ages procedure uses subjective opinions by published sources and/or authors, citing authors such as Michael O. Woodburne, Robert W. Wilson, and J. David Archibald.
+Appearance event ordination uses objective, explicit, recordable, repeatable, and quantitative analyses of faunal and biostratigraphic data to arrive at a conclusion, according to John Alroy.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Archival_research-0.md b/data/en.wikipedia.org/wiki/Archival_research-0.md
new file mode 100644
index 000000000..24896aab8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Archival_research-0.md
@@ -0,0 +1,24 @@
+---
+title: "Archival research"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Archival_research"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:51.960083+00:00"
+instance: "kb-cron"
+---
+
+Archival research is a type of research which involves seeking out and extracting evidence from archival records. These records may be held either in collecting institutions, such as libraries and museums, or in the custody of the organization (whether a government body, business, family, or other agency) that originally generated or accumulated them, or in that of a successor body (transferring, or in-house archives). Archival research can be contrasted with (1) secondary research (undertaken in a library or online), which involves identifying and consulting secondary sources relating to the topic of enquiry; and (2) with other types of primary research and empirical investigation such as fieldwork and experiment.
+
+== History of archives organizations ==
+The oldest archives have been in existence for hundreds of years. For instance, in Europe, the General Archive of the Crown of Aragon was instituted in 1318, or the Vatican Secret Archives which were started in the 17th century and contain state papers, papal account books, and papal correspondence dating back to the 8th century. The Archives Nationales in France was founded in 1790 during the French Revolution and has holdings that date back to AD 625, and other European archives have a similar provenance. Archives in the modern world, while of more recent date, may also hold material going back several centuries, for example, the United States National Archives and Records Administration was established originally in 1934. The NARA contains records and collections dating back to the founding of the United States in the 18th century. Among the collections of the NARA are the Declaration of Independence, the Constitution of the United States, and an original copy of Magna Carta. The British National Archives (TNA) traces its history to the creation of the Public Record Office in 1838, while other state and national bodies were also formed in the late 19th and early 20th centuries.
+Universities are another venue for archival holdings and manuscript collections. Most universities have archival holdings that chronicle the business of the university. Some universities also have archives or manuscript collections that focus on one aspect or another of the culture of the state or country in which the university is located. Schools and religious institutions, as well as local studies and history collections, museums and research institutions may all hold archives.
+The reason for highlighting the breadth and depth of archives is to give some idea of the difficulties facing archival researchers. Some of these archives hold vast quantities of records. For example, the Vatican Secret Archive has upwards of 52 miles of archival shelving. An increasing number of archives are now accepting digital transfers, which can also present challenges for display and access.
+
+== Archival research methodologies ==
+Archival research lies at the heart of most academic and other forms of original historical research; but it is frequently also undertaken (in conjunction with parallel research methodologies) in other disciplines within the humanities and social sciences, including literary studies, rhetoric, archaeology, sociology, human geography, anthropology, psychology, and organizational studies. It may also be important in other non-academic types of enquiry, such as the tracing of birth families by adoptees, and criminal investigations. Data held by archival institutions is also of use in scientific research and in establishing civil rights.
+In addition to discipline, the kind of research methodology used in archival research can vary depending on its organization and its materials. For example, in an archives that has a large number of materials still unprocessed, a researcher may find consulting directly with archive staff who have a clear understanding of collections and their organization to be useful as they can be a source of information regarding unprocessed materials or of related materials in other archives and repositories. When an archive is not entirely oriented towards one or relevant to a single discipline, researchers, for example genealogists, may rely upon formal or informal networks to support research by sharing information about specific archives' organization and collections with each other.
+
+== Conducting research at an archive ==
+
+Archival research is generally more complex and time-consuming than secondary research, presenting challenges in identifying, locating and interpreting relevant documents. Although archives share similar features and characteristics they can also vary in significant ways. While publicly funded archives may have mandates that require them to be as accessible as possible, other kinds, such as corporate, religious, or private archives, will have varying degrees of access and discoverability. Some materials may be restricted in other ways, such as on those containing sensitive or classified information, unpublished works, or imposed by agreements with the donor of materials. Furthermore, archival records are often unique, and the researcher must be prepared to travel to reach them. Even when materials are available in digital formats there may be restrictions on them that prohibit them from being accessed off-site.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Archival_research-1.md b/data/en.wikipedia.org/wiki/Archival_research-1.md
new file mode 100644
index 000000000..cbe75c942
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Archival_research-1.md
@@ -0,0 +1,28 @@
+---
+title: "Archival research"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Archival_research"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:51.960083+00:00"
+instance: "kb-cron"
+---
+
+=== Locating archival collections ===
+Prior to online search, union catalogs were an important tool for finding materials in libraries and archives. In the United States, the National Union Catalog and the National Union Catalog of Manuscript Collections have been used by researchers to locate archives although much of its information has since been migrated to online systems.
+An increasing number of archival institutions can be found via an online search. In addition, portals such as Europeana, the Digital Public Library of America and the National Library of Australia's Trove provide links to member institutions.
+In the UK, JISC hosts the ArchivesHub, while the OCLC's ArchiveGrid provides an international portal for mostly library based institutions, which use MARC as a cataloguing tool for their holdings. The Association of Canadian Archivists (ACA) has partnered with the software company Artefactual to create ArchivesCanada, while the Australian Society of Archivists have used the same software for their Directory of Archives in Australia. Many other online search tools have been made available to facilitate search and discovery, including the Location Register of English Literary Manuscripts and Letters, the ArchiveSearch guide to archival materials in institutions in Cambridge, UK, and CARTOMAC: Archives littéraires d'Afrique.
+If an archives cannot be found through online search or a publicly listed collection a researcher may have to track down its existence through other means, such as following other researcher's citations and references. This is particularly true for materials held by corporations or other organizations that may not employ an archivist and thus be unaware of the extent or contents of their materials.
+In very restricted archives, access may be restricted only to individuals with certain credentials or affiliations with institutions like universities and then only to those of a certain level. Those lacking the necessary credentials may need to request letters of introduction from an individual or institution to provide to the archive.
+
+=== Locating materials within archives ===
+Archives usually contain unique materials and their organization may also be entirely unique or idiosyncratic to the institution or organization that maintains them. This is one important distinction with libraries where material is organized according to standardized classification systems. Traditionally, archives have followed the principle of respect des fonds in which the provenance and original order is maintained although some rearrangement, physical or intellectual, may be done by the archivist to facilitate its use. A basic guideline for archival description is the International Standard of Archival Description (General) (ISAD/G or ISAD), produced by the International Council on Archives (ICA). American institutions may also be guided by Describing Archives: a content standard (DACS) and in Canada by the Rules of Archival Description Archived 16 May 2017 at the Wayback Machine (RAD). Understanding how archival descriptions and finding aids are constructed is known as archival intelligence.
+In addition to these standards and rules for creating hard copy and online listings and catalogues, archivists may also provide access to their catalogues through APIs or through the encoding standards EAD (Encoded archival description) (relating to the fonds, series, and items) and EAC (Encoded archival context)(the organisations and people that created the archives).
+Finding aids are a common reference tool created by archivists for locating materials. They come in a variety of forms, such as registers, card catalogs, or inventories. Many finding aids to archival documents are now hosted online as web pages or uploaded as documents, such as at the Library of Congress' Rare Book & Special Collections.  The level of detail in finding aids can vary from granular item-level descriptions to coarse collection-level descriptions. If an archive has a large backlog of unprocessed materials, there may not be any kind of finding aid at all. From around 2005, an ideology known as "More Product, Less Process", or MPLP, has been adopted by many North American collecting archives seeking to reduce processing time or alleviate backlogs to provide access to materials sooner, the results of which may be minimally described finding aids.
+Although most archive repositories welcome researchers, and have professional staff tasked with assisting them, the large quantity of records means that finding aids may be of only limited usefulness: the researcher will need to hunt through large quantities of documents in search of material relevant to his or her particular enquiry. Some records may be closed to public access for reasons of confidentiality; and others may be written in archaic handwriting, in ancient or foreign languages, or in technical terminology. Archival documents were generally created for immediate practical or administrative purposes, not for the benefit of future researchers, and additional contextual research may be necessary to make sense of them. Many of these challenges are exacerbated when the records are still in the custody of the generating body or in private hands, where owners or custodians may be unwilling to provide access to external enquirers, and where finding aids may be even more rudimentary or non-existent.
+
+=== Consulting archival materials ===
+
+==== On-site ====
+
+Archival materials are usually held in closed stacks and non-circulating. Users request to see specific materials from the archives and may only consult them on-site. After locating the relevant record location using a finding aid or other discovery tool a user may then have to submit the request to the archives, such as using a request form. If an archives has part of its holdings located in a separate building or facility, it make take days or weeks to retrieve materials, requiring a user to submit their requests in advance of an on-site consultation.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Archival_research-2.md b/data/en.wikipedia.org/wiki/Archival_research-2.md
new file mode 100644
index 000000000..ddf11e212
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Archival_research-2.md
@@ -0,0 +1,37 @@
+---
+title: "Archival research"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Archival_research"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:51.960083+00:00"
+instance: "kb-cron"
+---
+
+A reading room is a space, usually within or near the archive, where users can consult archival materials under staff supervision. The unique, fragile, or sensitive nature of some materials sometimes requires the certain kinds of restrictions on their use, handling, and/or duplication. Many archives restrict what kinds of items can be brought into a reading room from outside, such as pencils, notepads, bags, and even clothing, to guard against theft or risk of damage to materials. Further restrictions may be placed on the number of materials that can be consulted at any given time, such as limiting a user to one box at a time and requiring all materials to be laid flat and visible at all times. Some archives provide basic supplies including scrap paper and pencils or foam wedges for supporting unusually large materials. Duplication services may be available at the archive although the policies, costs, and time required can vary. Increasingly, archives also allow users to use their own devices, such as handheld cameras, cell phones, and even scanners, to duplicate materials. The use of white or any other glove, while popular in television programs, is not necessarily required for handling archival documents, due to concerns about fragility of pages and text. They may be required for handling volumes with poor bindings, if the gloves are removed for the internal pages to prevent transfer of dirt and other material, and should be used when handling photographs. Always check with the archivist as to whether gloves are required or not.
+Archives may also provide access to content via microfilm (including fiche and other formats) due to the fragility or popularity of the original archive. Digital copies may also be provided for the same reason. Before asking for access to the original, users should make sure that the items that have been reformatted are suitable for the use for which they are required. Reasons for asking for access to original content might include the need to view a colour image (architectural perspective and elevation drawings, maps and plans, etc.) or for accessibility reasons (minor visual vertigo is usually not considered a reason for access to originals, as the effect can be mitigated by slower perusal of the film).
+Some materials may contain information that concerns the privacy and confidentiality of living individuals, such as medical and student records, and demand special care. Materials that might contain personally identifiable information, such as social security numbers or names, must be handled appropriately, and an archive might provide redacted copies of materials or deny access to materials entirely due to privacy or other legislative concerns.
+
+==== Off-site and electronic materials ====
+More and more archival materials are being digitized or are born-digital enabling them to be accessed off-site through the internet or other networked services. Archives that have digital materials accessible to the public may make their holdings discoverable to internet search engines by sharing or exposing their electronic catalogs and/or metadata, using standards like the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Some institutions have online portals where users can freely access digital materials that have been made available by the archive such as the Archives of the New York Public Library or the Smithsonian Institution Archives. Governments and their related institutions may use these "electronic", or "virtual", reading rooms to upload documents and materials that have been requested by the public such as through FOIA requests or in accordance with records disclosure policies.
+
+== References ==
+
+== External links ==
+National Archives and Records Administration (NARA), United States of America
+NARA: "Research Our Records"
+The National Archives (TNA), United Kingdom
+TNA: "Help with your research"
+TNA: "How to use archives"
+Trace Your Birth Family In The UK
+"Archive skills and tools for historians" - Making History (Institute of Historical Research, University of London)
+Society of American Archivists: Using Archives: A Guide to Effective Research
+
+=== LibGuides on Archival Research ===
+Guide to Archival Research (Dalhouse University)
+Archival Research Guide (Georgetown University Library)
+A Guide to Archival Research (Emory Libraries)
+Introduction to Archival Research (Duke University Libraries)
+Archival Research: Why Archival Research (Georgia State University)
+Doing Archival Research (Williams College)
+Conducting Archival Research (University of the Witwatersrand)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Astronomical_chronology-0.md b/data/en.wikipedia.org/wiki/Astronomical_chronology-0.md
new file mode 100644
index 000000000..7ab4cfe5c
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Astronomical_chronology-0.md
@@ -0,0 +1,65 @@
+---
+title: "Astronomical chronology"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Astronomical_chronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:30.297138+00:00"
+instance: "kb-cron"
+---
+
+Astronomical chronology, or astronomical dating, is a technical method of dating events or artifacts that are associated with astronomical phenomena.  Written records of historical events that include descriptions of astronomical phenomena have done much to clarify the chronology of the Ancient Near East; works of art which depict the configuration of the stars and planets and buildings which are oriented to the rising and setting of celestial bodies at a particular time have all been dated through astronomical calculations.
+
+
+== Dating historical events ==
+
+The use of descriptions of astronomical phenomena to date historical events began in the 16th century, a time of a renewed humanistic interest in history and of increasingly precise astronomical tables. Eclipses in particular are relatively infrequent events and can be dated precisely.  When the circumstances are not exact and descriptions leave ambiguities, one can often use other details such as the month of the eclipse or the position of other stars and planets to identify the specific eclipse.
+Astronomical dating, like other forms of historical interpretation, requires care in interpreting the surviving written records.  John Steele has proposed three questions that must be asked when dating an event:  Does the record refer to an actual astronomical event, or is this merely a modern assumption?  If it does refer to an actual astronomical event, is the source reliable?  Can the record provide an unambiguous date without making unwarranted assumptions about ancient astronomical observational methods?
+Babylonian astronomical diaries provide detailed and unambiguous accounts of the positions of all the visible planets, often in relation to specific stars, that have been used to provide precise dates of events like the defeat of Darius III by Alexander the Great at the Battle of Gaugamela on 1 October 331 BCE and of Alexander's subsequent death on 11 June 323.
+Since the success of this method depends on the reliability of the written sources and the precision of their accounts of astronomical phenomena, attempts to date literary texts which may describe astronomical events loosely or even as metaphors have led researchers to conclusions that appear precise, but rely on invalid assumptions and are consequently less widely accepted.  Thus the attempts to date Vedic texts describing the Pleiades as rising "due East" to about 2300 BCE, which is the time when the Pleiades rose "exactly" due East, is complicated by the fact that poetic descriptions need not be taken as reflecting precise astronomical observations, while precession is a very slow process which makes only small changes in the azimuth of a star rising in the East.
+
+
+== Dating artifacts ==
+
+Among the artifacts that can most readily be dated by astronomical techniques are depictions of the positions of the celestial bodies at a particular time.  Since the motions of the celestial bodies are all at different periods, it takes many centuries for all the planets plus the Sun and the Moon, to reach the same positions in the signs of the Zodiac.  For a configuration accurate to ±15° (that is, within a single sign) the positions of these seven bodies will only return to the same configuration once in about 3700 years.  A particular case involved a medieval illuminated manuscript which portrayed the position of these seven celestial bodies on 18 March 816; corresponding to the period when the manuscript was written. This calculation demonstrated that this illustration was not a copy of an earlier classical depiction of the position of the stars.  The rapidly moving Moon is the most sensitive indicator for the exact time; if one can estimate the indicated position of the Moon to within a degree, the time of the diagram can be computed to within an hour.
+A striking example of this method was an astrological portrait of Sir Christopher Hatton (1540–1591), which depicted the positions of the seven classical planets in the zodiac and noted the computed positions of the planets to the nearest minute of arc.  Here the largest source of error in the date was the uncertainty of 16th-century astronomical calculations.  The resulting time was about noon of 12 December 1581.
+
+
+== Dating structures by their orientation ==
+A more controversial archaeoastronomical approach has been used to date structures that are believed to have been oriented on astronomical principles by measuring their orientation and computing the date in the past when a single specified celestial body, whether the Sun or a selected star, rises or sets at the measured azimuth.  The astronomer Norman Lockyer applied this method to Stonehenge by measuring the orientation of the Stonehenge avenue and comparing it to the position of solstitial sunrise, which changes slowly due to the changing obliquity of the ecliptic. The archaeologist F. C. Penrose applied a similar method to ancient Greek Temples, attempting to establish their dates by relating their orientation to the appearance of stars on the horizon, the position of which changes slowly due to the precession of the equinoxes.
+The wide variance of these dates from historically accepted ones led the architect and archaeologist William Bell Dinsmoor to mistrust dates established by the slowly changing obliquity of the ecliptic or by stellar alignments, which involve an arbitrary selection of a star that rises on the proper azimuth.  Instead he proposed a method employing what was already known from historical records concerning the dates of construction of Greek temples, the festivals associated with specific temples, and the nature of the Greek Lunisolar calendar.  Since the date of a festival in the Greek lunisolar calendar only recurs on the same date in the solar calendar every eight or nineteen years, Dinsmoor identified a festival connected with a specific temple and was able to determine the exact year near the historically recorded construction date when the Sun rose in alignment with the temple on the date of the festival.
+
+
+== See also ==
+Astronomical chronology
+Age of the Earth
+Age of the universe
+Chronological dating, archaeological chronology
+Absolute dating
+Relative dating
+Phase (archaeology)
+Archaeological association
+Geochronology
+Geologic time scale
+Geological history of Earth
+
+
+== Notes ==
+
+
+== References ==
+Neugebaer, Otto.  A History of Ancient Mathematical Astronomy, (3 vols).  New York: Springer, 1975.  Vol. 3, pp. 1071–1076 provides a brief introduction to astronomical chronology.
+
+
+== Bibliography ==
+Fraser, Gordon Fraser. Star Territory: Printing the Universe in Nineteenth-Century America. Material Texts. Philadelphia: University of Pennsylvania Press, 2021.
+Giovannetti-Singh, Gianamar. "Astronomical Chronology, the Jesuit China Mission, and Enlightenment History". Journal of the History of Ideas, 83(3) (2023): 487-510. https://doi.org/10.1353/jhi.2023.a901491
+Gingerich, Owen and Barbara Welther.  Planetary, Lunar, and Solar Positions, A. D. 1650 to 1805, Memoirs of the American Philosophical Society, 59S.  Philadelphia, 1983.
+Neugebauer, Paul V.  Astronomische Chronologie (2 vols).  Berlin: De Gruyter, 1929.
+Steele, John M.  "The Use and Abuse of Astronomy in Establishing Absolute Chronologies", Physics in Canada/La Physique au Canada, 59 (2003): 243-248.
+Tuckerman, Bryant.  Planetary, Lunar, and Solar Positions, 601 B.C. to A, D. 1, Memoirs of the American Philosophical Society, 56.  Philadelphia, 1962.
+Tuckerman, Bryant.  Planetary, Lunar, and Solar Positions, A. D. 2 to 1649, Memoirs of the American Philosophical Society, 59.  Philadelphia, 1964.
+
+
+== External links ==
+van Gent, R.H., Astronomical Chronology
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Attestation-0.md b/data/en.wikipedia.org/wiki/Attestation-0.md
new file mode 100644
index 000000000..5c94a7803
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Attestation-0.md
@@ -0,0 +1,30 @@
+---
+title: "Attestation"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Attestation"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:53.102961+00:00"
+instance: "kb-cron"
+---
+
+An attestation is something that serves to bear witness, confirm, authenticate or verify the validity of some fact or status. An attestor is someone who performs an attestation. An attestation date is the date on which an attestation is performed.
+
+
+== Examples ==
+Examples of attestations include:
+
+Testimony, a sworn verification of the truth of a set of factual statements
+An attestation clause, verifying a document
+A police oath or an oath of allegiance in armed forces of the United Kingdom, pledging loyalty or the faithful execution of duties
+A validation of the integrity of a computing device such as a server needed for trusted computing
+
+
+== See also ==
+ The dictionary definition of attest at Wiktionary
+ The dictionary definition of  attestation at Wiktionary
+ The dictionary definition of  attestor at Wiktionary
+Attested language, a language for which documented evidence exists
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Bataan_Rice_Enrichment_Project-0.md b/data/en.wikipedia.org/wiki/Bataan_Rice_Enrichment_Project-0.md
new file mode 100644
index 000000000..03b6183d9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Bataan_Rice_Enrichment_Project-0.md
@@ -0,0 +1,40 @@
+---
+title: "Bataan Rice Enrichment Project"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Bataan_Rice_Enrichment_Project"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:30.770293+00:00"
+instance: "kb-cron"
+---
+
+Bataan Rice Enrichment Project or the Bataan Experiment, was a collaborative research venture between American chemist Robert R. Williams and Juan Salcedo Jr. It was a series of feeding experiments conducted in  municipalities in Bataan between 1947 and 1949. By the end of the experiments, it is shown that thiamine-enriched rice can reduce the cases of beriberi in the Philippines, which was the leading cause of deaths during those times.
+
+
+== Overview ==
+
+The enrichment project came first as a plan in 1943. During this time, Salcedo, who had his studies at Columbia University, met American chemist Robert R. Williams, a well renowned scientist for his synthesis on vitamin B1 in 1935.
+
+The Philippine Bureau of Health reported a relatively stable beriberi rate from the mid-1920s to 1940. However, after World War II, beriberi cases surged, becoming the second leading cause of death in 1946 and 1947. Infants accounted for a significant portion of these deaths.
+At this time, Williams was disappointed by the agencies at the United Nations to further eradicate the rise of beriberi cases around the globe. Due to his previous failures in rice enrichment programs, Williams became desperate. Together with Salcedo, they began feeding experiments in the province of Bataan.
+The specific objectives of the feeding experiment were:
+
+Determine if enriched rice could effectively treat beriberi.
+Test the practicality of using enriched rice in the rice trade.
+Establish a system to ensure only enriched rice is sold.
+Promote the use of enriched rice among the people and explore its potential for widespread use throughout the Philippines.
+The experiments were conducted in Bataan where it was divided into two areas: the experimental zone and control zone. After the introduction of thiamine-enriched rice, the experimental area received significant results. Mortality rate in beriberi significantly decreased after the introduction of the nutrient-enriched white rice from July 1, 1948, to June 30, 1950. Before the introduction, there occurred 167 deaths from beriberi cases from July 1, 1947, to June 30, 1948. It decreased further to just 18 deaths after the introduction.
+
+
+== Human rights concerns ==
+Williams intentionally exposed half of Bataan's food-deficient population to beriberi, replicating the unethical experiments conducted by Euro-American researchers on prisoners and asylum patients. He also recreated prison camps and asylums to further persuade unwilling participants. This act, unfortunately, was seen as a form of colonial exploitation by both the Filipino people and nationalist physicians, who recognized beriberi as a symptom of the colonial system.
+The control group used for the feeding experiments were also denied of access from the enriched rice. This resulted an unwanted exposure of beriberi among research participants.
+
+
+== Reception and aftermath ==
+Due to positive results from the experiment, in 1950, both Williams and Salcedo planned to expand the rice enrichment project throughout the Philippines. However, this was met by opposition from the Philippine government despite insistence from Williams. During the 1950s, it was further delayed by the Hukbalahap rebellion.
+From then on, the government stopped subsidizing the project entirely and had to rely from funding by Williams and supported by a team from the Food and Agriculture Organization (FAO). From 1966 to 1970, FAO sponsored the introduction of high-yielding rice varieties into the Philippines. According to a 1971 report by FAO, the implementation of the rice enrichment projects in the Philippines, Taiwan, and Japan showed the "conflicting economic interests of millers,
+governments and consumers".
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Behavioral_experiment-0.md b/data/en.wikipedia.org/wiki/Behavioral_experiment-0.md
new file mode 100644
index 000000000..5d53394bd
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Behavioral_experiment-0.md
@@ -0,0 +1,34 @@
+---
+title: "Behavioral experiment"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Behavioral_experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:31.895442+00:00"
+instance: "kb-cron"
+---
+
+Technically, all scientific experiments measure a change in hypothesized causal behavior, and may drop the behavioral prefix.
+Behavioral experiment may refer to:
+
+Behavioral experiment (analysis)
+Behavioral experiment (animals), for controlling variables (vs. field studies)
+Behavioral experiment (cognitive science), for determining what constitutes intelligent behavior
+Behavioral experiment (cognitive therapy), method for cognitive restructuring
+Behavioral experiment (cognitive behavioral therapy), for testing the validity of negative and alternative thoughts in real-life situations
+Behavioral experiment (computational modeling), of computational model for comparison with human data
+Behavioral experiment (experimental psychology), for measuring reaction time, choices among alternatives, and/or response rate or strength
+Behavioral experiment (human reasoning), for studying human reasoning
+Behavioral experiment (conditional reasoning), on conditionals in the psychology of reasoning
+Behavioral experiment (psychotherapy), for identifying potentially negative or harmful beliefs
+
+
+== See also ==
+Behavioral experiments for monotropism
+Behaviorism, which is based on such experiments
+Experiment
+Category:Science experiments
+All pages with titles containing experiment
+Behavior
+All pages with titles containing behavior
+All pages with titles containing behaviour
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Burden_of_proof_(philosophy)-0.md b/data/en.wikipedia.org/wiki/Burden_of_proof_(philosophy)-0.md
new file mode 100644
index 000000000..8d7e5aa2b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Burden_of_proof_(philosophy)-0.md
@@ -0,0 +1,37 @@
+---
+title: "Burden of proof (philosophy)"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Burden_of_proof_(philosophy)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:54.272353+00:00"
+instance: "kb-cron"
+---
+
+The burden of proof (Latin: onus probandi, shortened from Onus probandi incumbit ei qui dicit, non ei qui negat – the burden of proof lies with the one who speaks, not the one who denies) is the obligation on a party in a dispute to provide sufficient warrant for its position.
+
+== Holder of the burden ==
+When two parties are in a discussion and one makes a claim that the other disputes, the one who makes the claim typically has a burden of proof to justify or substantiate that claim, especially when it challenges a perceived status quo. This is also stated in Hitchens's razor, which declares that "what may be asserted without evidence may be dismissed without evidence." Carl Sagan proposed a related criterion: "Extraordinary claims require extraordinary evidence".
+While certain kinds of arguments, such as logical syllogisms, require mathematical or strictly logical proofs, the standard for evidence to meet the burden of proof is usually determined by context and community standards and conventions.
+Philosophical debate can devolve into arguing about who has the burden of proof about a particular claim. This has been described as "burden tennis" or the "onus game".
+
+== Shifting the burden of proof ==
+One way in which one would attempt to shift the burden of proof is by committing a logical fallacy known as the argument from ignorance. It occurs when either a proposition is assumed to be true because it has not yet been proven false or a proposition is assumed to be false because it has not yet been proven true.
+
+== Proving a negative ==
+A negative claim is the opposite of an affirmative or positive claim.  It asserts the non-existence or exclusion of something.
+Logicians and philosophers of logic reject the notion that it is intrinsically impossible to prove negative claims. Philosophers Steven D. Hale and Stephen Law state that the phrase "you cannot prove a negative" is itself a negative claim that would not be true if it could be proven true. Many negative claims can be rewritten into logically equivalent positive claims (for example, "No Jewish person was at the party" is logically equivalent to "Everyone at the party was a gentile"). In formal logic and mathematics, the negation of a proposition can be proven using procedures such as modus tollens and reductio ad absurdum. In empirical contexts (such as evaluating the existence or nonexistence of unicorns), inductive reasoning is often used for establishing the plausibility of a claim based on observed evidence. Though inductive reasoning may not provide absolute certainty about negative claims, this is only due to the nature of inductive reasoning; inductive reasoning provides proof from probability rather than certainty. Inductive reasoning also does not provide absolute certainty about positive claims.  
+A negative claim may or may not exist as a counterpoint to a previous claim. A proof of impossibility or an evidence of absence argument are typical methods to fulfill the burden of proof for a negative claim.
+
+== Application ==
+
+=== In public discourse ===
+Burden of proof is an important concept in the public arena of ideas. Once participants in discourse establish common assumptions, the mechanism of burden of proof helps to ensure that all parties contribute productively, using relevant arguments.
+
+=== In law ===
+
+In a legal dispute, one party is initially presumed to be correct and gets the benefit of the doubt, while the other side bears the burden of proof. When a party bearing the burden of proof meets their burden, the burden of proof switches to the other side. Burdens may be of different kinds for each party, in different phases of litigation. The burden of production is a minimal burden to produce at least enough evidence for the trier of fact to consider a disputed claim. After litigants have met the burden of production and their claim is being considered by a trier of fact, they have the burden of persuasion, that enough evidence has been presented to persuade the trier of fact that their side is correct. There are different standards of persuasiveness ranging from a preponderance of the evidence, where there is just enough evidence to tip the balance, to proof beyond a reasonable doubt, as in United States criminal courts.
+The burden of proof is usually on the person who brings a claim in a dispute. It is often associated with the Latin maxim semper necessitas probandi incumbit ei qui agit, a translation of which in this context is: "the necessity of proof always lies with the person who lays charges."
+The party that does not carry the burden of proof carries the benefit of assumption of being correct, they are presumed to be correct, until the burden shifts after presentation of evidence by the party bringing the action. An example is in an American criminal case, where there is a presumption of innocence by the defendant. Fulfilling the burden of proof effectively captures the benefit of assumption, passing the burden of proof off to another party.
+
+=== In statistics ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Burden_of_proof_(philosophy)-1.md b/data/en.wikipedia.org/wiki/Burden_of_proof_(philosophy)-1.md
new file mode 100644
index 000000000..39f765bf2
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Burden_of_proof_(philosophy)-1.md
@@ -0,0 +1,20 @@
+---
+title: "Burden of proof (philosophy)"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Burden_of_proof_(philosophy)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:54.272353+00:00"
+instance: "kb-cron"
+---
+
+In inferential statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. Rejecting or disproving the null hypothesis—and thus concluding that there are grounds for believing that there is a relationship between two phenomena (e.g. that a potential treatment has a measurable effect)—is a central task in the modern practice of science; the field of statistics gives precise criteria for rejecting a null hypothesis.
+The null hypothesis is generally assumed to be true until evidence indicates otherwise.  In statistics, it is often denoted H0 (read "H-nought", "H-null", "H-oh", or "H-zero").
+The concept of a null hypothesis is used differently in two approaches to statistical inference. In the significance testing approach of Ronald Fisher, a null hypothesis is rejected if the observed data are significantly unlikely to have occurred if the null hypothesis were true. In this case the null hypothesis is rejected and an alternative hypothesis is accepted in its place. If the data are consistent with the null hypothesis, then the null hypothesis is not rejected. In neither case is the null hypothesis or its alternative proven; the null hypothesis is tested with data and a decision is made based on how likely or unlikely the data are. This is analogous to the legal principle of presumption of innocence, in which a suspect or defendant is assumed to be innocent (null is not rejected) until proven guilty (null is rejected) beyond a reasonable doubt (to a statistically significant degree).
+In the hypothesis testing approach of Jerzy Neyman and Egon Pearson, a null hypothesis is contrasted with an alternative hypothesis and the two hypotheses are distinguished on the basis of data, with certain error rates.
+Proponents of each approach criticize the other approach. Nowadays, though, a hybrid approach is widely practiced and presented in textbooks. The hybrid is in turn criticized as incorrect and incoherent—for details, see Statistical hypothesis testing.
+Statistical inference can be done without a null hypothesis, by specifying a statistical model corresponding to each candidate hypothesis and using model selection techniques to choose the most appropriate model.  (The most common selection techniques are based on either Akaike information criterion or Bayes factor.)
+
+== See also ==
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Case_study-0.md b/data/en.wikipedia.org/wiki/Case_study-0.md
index 1cc4fd61c..b0a2698fe 100644
--- a/data/en.wikipedia.org/wiki/Case_study-0.md
+++ b/data/en.wikipedia.org/wiki/Case_study-0.md
@@ -4,7 +4,7 @@ chunk: 1/3
 source: "https://en.wikipedia.org/wiki/Case_study"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:27:12.796667+00:00"
+date_saved: "2026-05-05T09:55:55.504712+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Case_study-1.md b/data/en.wikipedia.org/wiki/Case_study-1.md
index 24b9b151c..036db35b2 100644
--- a/data/en.wikipedia.org/wiki/Case_study-1.md
+++ b/data/en.wikipedia.org/wiki/Case_study-1.md
@@ -4,7 +4,7 @@ chunk: 2/3
 source: "https://en.wikipedia.org/wiki/Case_study"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:27:12.796667+00:00"
+date_saved: "2026-05-05T09:55:55.504712+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Case_study-2.md b/data/en.wikipedia.org/wiki/Case_study-2.md
index 39edb06b9..c992eb07b 100644
--- a/data/en.wikipedia.org/wiki/Case_study-2.md
+++ b/data/en.wikipedia.org/wiki/Case_study-2.md
@@ -4,7 +4,7 @@ chunk: 3/3
 source: "https://en.wikipedia.org/wiki/Case_study"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:27:12.796667+00:00"
+date_saved: "2026-05-05T09:55:55.504712+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Cherry_picking-0.md b/data/en.wikipedia.org/wiki/Cherry_picking-0.md
new file mode 100644
index 000000000..d49ba4d56
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Cherry_picking-0.md
@@ -0,0 +1,45 @@
+---
+title: "Cherry picking"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Cherry_picking"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:56.680790+00:00"
+instance: "kb-cron"
+---
+
+Cherry picking, suppressing evidence, or the fallacy of incomplete evidence is the act of pointing to individual cases or data that seem to confirm a particular position while ignoring a significant portion of related and similar cases or data that may contradict that position. Cherry picking may be committed intentionally or unintentionally.
+
+== Name ==
+
+The term is based on the perceived process of harvesting fruit, such as cherries. The picker would be expected to select only the ripest and healthiest fruits. An observer who sees only the selected fruit may thus wrongly conclude that most, or even all, of the tree's fruit is in a likewise good condition. This can also give a false impression of the quality of the fruit (since it is only a sample and is not a representative sample). A concept sometimes confused with cherry picking is the idea of gathering only the fruit that is easy to harvest, while ignoring other fruit that is higher up on the tree and thus more difficult to obtain (see low-hanging fruit).
+Cherry picking has a negative connotation as the practice neglects, overlooks or directly suppresses evidence that could lead to a complete picture.
+Cherry picking can be found in many logical fallacies. For example, the "fallacy of anecdotal evidence" tends to overlook large amounts of data in favor of that known personally, "selective use of evidence" rejects material unfavorable to an argument, while a false dichotomy picks only two options when more are available. Some scholars classify cherry-picking as a fallacy of selective attention, the most common example of which is the confirmation bias.  Cherry picking can refer to the selection of data or data sets so a study or survey will give desired, predictable results which may be misleading or even completely contrary to reality.
+
+== History ==
+A story about the 5th century BCE atheist philosopher Diagoras of Melos says how, when shown the votive gifts of people who had supposedly escaped death by shipwreck by praying to gods, he pointed out that many people had died at sea in spite of their prayers, yet these cases were not likewise commemorated (this is an example of survivorship bias).
+Michel de Montaigne (1533–1592) in his essay on prophecies comments on people willing to believe in the validity of supposed seers: 
+
+I see some who are mightily given to study and comment upon their almanacs, and produce them to us as an authority when anything has fallen out pat; and, for that matter, it is hardly possible but that these alleged authorities sometimes stumble upon a truth amongst an infinite number of lies. ... I think never the better of them for some such accidental hit. ... [N]obody records their flimflams and false prognostics, forasmuch as they are infinite and common; but if they chop upon one truth, that carries a mighty report, as being rare, incredible, and prodigious.
+
+== In science ==
+Cherry picking is one of the epistemological characteristics of denialism and widely used by different science denialists to seemingly contradict scientific findings. For example, it is used in climate change denial, evolution denial by creationists, denial of the negative health effects of consuming tobacco products and of passive smoking. P-hacking may also be considered a form of cherry-picking.
+
+ Choosing to make selective choices among competing evidence, so as to emphasize those results that support a given position, while ignoring or dismissing any findings that do not support it, is a practice known as "cherry picking" and is a hallmark of poor science or pseudo-science.
+
+ Rigorous science looks at all the evidence (rather than cherry picking only favorable evidence), controls for variables as to identify what is actually working, uses blinded observations so as to minimize the effects of bias, and uses internally consistent logic."
+ 
+
+== In medicine ==
+In a 2002 study, a review of previous medical data found cherry picking in tests of anti-depression medication:
+
+[researchers] reviewed 31 antidepressant efficacy trials to identify the primary exclusion criteria used in determining eligibility for participation. Their findings suggest that patients in current antidepressant trials represent only a minority of patients treated in routine clinical practice for depression. Excluding potential clinical trial subjects with certain profiles means that the ability to generalize the results of antidepressant efficacy trials lacks empirical support, according to the authors.
+
+== In argumentation ==
+In argumentation, the practice of "quote mining" is a form of cherry picking, in which the debater selectively picks some quotes supporting a position (or exaggerating an opposing position) while ignoring those that moderate the original quote or put it into a different context. Cherry picking in debates is a large problem as the facts themselves are true but need to be put in context. Because research cannot be done live and is often untimely, cherry-picked facts or quotes usually stick in the public mainstream and, even when corrected, lead to widespread misrepresentation of the groups targeted.
+
+=== One-sided argument ===
+A one-sided argument (also known as card stacking, stacking the deck, ignoring the counter-evidence, slanting, and suppressed evidence) is an informal fallacy that occurs when only the reasons supporting a proposition are supplied, while all reasons opposing it are omitted.
+
+Philosophy professor Peter Suber has written:The one-sidedness fallacy does not make an argument invalid. It may not even make the argument unsound. The fallacy consists in persuading readers, and perhaps ourselves, that we have said enough to tilt the scale of evidence and therefore enough to justify a judgment. If we have been one-sided, though, then we haven't yet said enough to justify a judgment. The arguments on the other side may be stronger than our own. We won't know until we examine them. 
+So the one-sidedness fallacy doesn't mean that your premises are false or irrelevant, only that they are incomplete.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Cherry_picking-1.md b/data/en.wikipedia.org/wiki/Cherry_picking-1.md
new file mode 100644
index 000000000..1ed923748
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Cherry_picking-1.md
@@ -0,0 +1,18 @@
+---
+title: "Cherry picking"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Cherry_picking"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:56.680790+00:00"
+instance: "kb-cron"
+---
+
+[…] You might think that one-sidedness is actually desirable when your goal is winning rather than discovering a complex and nuanced truth. If this is true, then it's true of every fallacy. If winning is persuading a decision-maker, then any kind of manipulation or deception that actually works is desirable. But in fact, while winning may sometimes be served by one-sidedness, it is usually better served by two-sidedness. If your argument (say) in court is one-sided, then you are likely to be surprised by a strong counter-argument for which you are unprepared. The lesson is to cultivate two-sidedness in your thinking about any issue. Beware of any job that requires you to truncate your own understanding.
+Card stacking is a propaganda technique that seeks to manipulate audience perception of an issue by emphasizing one side and repressing another. Such emphasis may be achieved through media bias or the use of one-sided testimonials, or by simply censoring the voices of critics. The technique is commonly used in speeches by political candidates to discredit their opponents and to make themselves seem more worthy.
+The term originates from the magician's gimmick of "stacking the deck", which involves presenting a deck of cards that appears to have been randomly shuffled but which is, in fact, 'stacked' in a specific order. The magician knows the order and is able to control the outcome of the trick. In poker, cards can be stacked so that certain hands are dealt to certain players.
+The phenomenon can be applied to any subject and has wide applications. Wherever a broad spectrum of information exists, appearances can be influenced by highlighting some facts and ignoring others. Card stacking can be a tool of advocacy groups or of those groups with specific agendas. For example, an enlistment poster might focus upon an impressive picture, with words such as "travel" and "adventure", while placing the words, "enlist for two to four years" at the bottom in a smaller and less noticeable font size.
+
+== See also ==
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Chronological_dating-0.md b/data/en.wikipedia.org/wiki/Chronological_dating-0.md
new file mode 100644
index 000000000..328cfa2b9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Chronological_dating-0.md
@@ -0,0 +1,81 @@
+---
+title: "Chronological dating"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Chronological_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:24.404740+00:00"
+instance: "kb-cron"
+---
+
+Chronological dating, or simply dating, is the process of attributing to an object or event a date in the past, allowing such object or event to be located in a previously established chronology. This usually requires what is commonly known as a "dating method". Several dating methods exist, depending on different criteria and techniques, and some very well known examples of disciplines using such techniques are, for example, history, geology, paleontology, archaeology, astronomy and even forensic science, since in the latter it is sometimes necessary to investigate the moment in the past during which the death of a cadaver occurred. These methods are typically identified as absolute, which involves a specified date or date range, or relative, which refers to dating which places artifacts or events on a timeline relative to other events and/or artifacts. Other markers can help place an artifact or event in a chronology, such as nearby writings and stratigraphic markers.
+
+== Absolute and relative dating ==
+Dating methods are most commonly classified following two criteria: relative dating and absolute dating.
+
+=== Relative dating ===
+
+Relative dating methods are unable to determine the absolute age of an object or event, but can determine the impossibility of a particular event happening before or after another event of which the absolute date is well known. In this relative dating method, Latin terms ante quem and post quem are usually used to indicate both the most recent and the oldest possible moments when an event occurred or an artifact was left in a stratum, respectively. But this method is also useful in many other disciplines. Historians, for example, know that Shakespeare's play Henry V was not written before 1587 because Shakespeare's primary source for writing his play was the second edition of Raphael Holinshed's Chronicles, not published until 1587. Thus, 1587 is the post quem dating of Shakespeare's play Henry V. That means that the play was without fail written after (in Latin, post) 1587.
+The same inductive mechanism is applied in archaeology, geology and paleontology, by many ways. For example, in a stratum presenting difficulties or ambiguities to absolute dating, paleopalynology can be used as a relative referent by means of the study of the pollens found in the stratum. This is admitted because of the simple reason that some botanical species, whether extinct or not, are well known as belonging to a determined position in the scale of time.
+For a non-exhaustive list of relative dating methods and relative dating applications used in geology, paleontology or archaeology, see the following:
+
+Cross-cutting relationships
+Fluorine absorption dating
+Harris matrix
+Law of included fragments
+Law of superposition
+Lichenometry
+Marine isotope stages, based on the oxygen isotope ratio cycle
+Melt inclusions
+Morphology (archaeology)
+Nitrogen dating
+Palynology, the study of modern-dated pollens for the relative dating of archaeological strata, also used in forensic palynology.
+Paleomagnetism
+Paleopalynology, also spelt "Palaeopalynology", the study of fossilized pollens for the relative dating of geological strata.
+Principle of original horizontality
+Principle of lateral continuity
+Principle of faunal succession
+Seriation (archaeology)
+Sequence dating (a type of seriation)
+Tephrochronology
+Typology (archaeology)
+Uranium–lead dating. Lead corrosion dating (exclusively used in archaeology)
+Varnish microlamination
+Vole clock
+
+=== Absolute dating ===
+
+Absolute dating methods seek to establish a specific time during which an object originated or an event took place. While the results of these techniques are largely accepted within the scientific community, there are several factors which can hinder the discovery of accurate absolute dating, including sampling errors and geological disruptions. This type of chronological dating utilizes absolute referent criteria, mainly the radiometric dating methods. Material remains can be absolutely dated by studying the organic materials which construct the remains. For example, remains that have pieces of brick can undergo the process of thermoluminescence (TL) dating in order to determine approximately how many years ago the material was fired. This technique was used to discover the date of St. James Church in Toruń by testing the thermoluminescence of removed bricks. In this example, an absolute date was determined which filled a gap in the historical knowledge of the church.   
+These techniques are utilized in many other fields as well. Geologists, for example, apply absolute dating methods to rock sediment in order to discover their period of origin. 
+Some examples of both radiometric and non-radiometric absolute dating methods are the following:
+
+Amino acid dating
+Archaeomagnetic dating
+Argon–argon dating
+Astronomical chronology
+Carbon dating: Also known as radiocarbon dating, it can reveal the age of organic material in artifacts as well as human and animal remains. This process can reliably measures dates up to approximately 50,000 years ago.
+Cementochronology, this method does not determine a precise moment in a scale of time but the age at death of a dead individual.
+Datestone (exclusively used in archaeology)
+Dendrochronology
+Electron spin resonance dating
+Fission track dating
+Geochronology
+Herbchronology
+Iodine–xenon dating
+Potassium–argon dating
+Lead–lead dating
+Luminescence dating
+Thermoluminescence dating
+Optically stimulated luminescence
+Optically stimulated luminescence thermochronometry
+Molecular clock (used mostly in phylogenetics and evolutionary biology)
+Obsidian hydration dating (exclusively used in archaeology)
+Oxidizable carbon ratio dating
+Rehydroxylation dating
+Rubidium–strontium dating
+Samarium–neodymium dating
+Tephrochronology
+Uranium–lead dating
+Uranium–thorium dating
+Uranium–uranium dating, useful in dating samples between about 10,000 and 2 million years Before Present (BP), or up to about eight times the half-life of 234U.
+Wiggle matching
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Chronological_dating-1.md b/data/en.wikipedia.org/wiki/Chronological_dating-1.md
new file mode 100644
index 000000000..6a41997c6
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Chronological_dating-1.md
@@ -0,0 +1,42 @@
+---
+title: "Chronological dating"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Chronological_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:24.404740+00:00"
+instance: "kb-cron"
+---
+
+== Dating methods in archaeology ==
+Just like geologists or paleontologists, archaeologists are also brought to determine the age of both ancient and recent humans. Thus, to be considered as archaeological, the remains, objects or artifacts to be dated must be related to human activity. It is commonly assumed that if the remains or elements to be dated are older than the human species, the disciplines which study them are sciences such geology or paleontology, among some others.
+Nevertheless, the range of time within archaeological dating can be enormous compared to the average lifespan of a singular human being. As an example Pinnacle Point's caves, in the southern coast of South Africa, provided evidence that marine resources (shellfish) have been regularly exploited by humans as of 170,000 years ago. On the other hand, remains as recent as a hundred years old can also be the target of archaeological dating methods. It was the case of an 18th-century sloop whose excavation was led in South Carolina (United States) in 1992. Thus, from the oldest to the youngest, all archaeological sites are likely to be dated by an appropriate method.
+Dating material drawn from the archaeological record can be made by a direct study of an artifact, or may be deduced by association with materials found in the context the item is drawn from or inferred by its point of discovery in the sequence relative to datable contexts. Dating is carried out mainly post excavation, but to support good practice, some preliminary dating work called "spot dating" is usually run in tandem with excavation. Dating is very important in archaeology for constructing models of the past, as it relies on the integrity of dateable objects and samples. Many disciplines of archaeological science are concerned with dating evidence, but in practice several different dating techniques must be applied in some circumstances, thus dating evidence for much of an archaeological sequence recorded during excavation requires matching information from known absolute or some associated steps, with a careful study of stratigraphic relationships.
+In addition, because of its particular relation with past human presence or past human activity, archaeology uses almost all the dating methods that it shares with the other sciences, but with some particular variations, like the following:
+
+=== Written markers ===
+Epigraphy – analysis of inscriptions, via identifying graphemes, clarifying their meanings, classifying their uses according to dates and cultural contexts, and drawing conclusions about the writing and the writers.
+Numismatics – many coins have the date of their production written on them or their use is specified in the historical record.
+Palaeography – the study of ancient writing, including the practice of deciphering, reading, and dating historical manuscripts.
+
+=== Seriation ===
+Seriation is a relative dating method (see, above, the list of relative dating methods). An example of a practical application of seriation, is the comparison of the known style of artifacts such as stone tools or pottery.
+
+=== Age-equivalent stratigraphic markers ===
+Paleomagnetism (a relative dating method, see the corresponding list above)
+Marine isotope stages based on the oxygen isotope ratio cycle (a relative dating method, see the corresponding list above)
+Tephrochronology (an absolute dating method, see the corresponding list above)
+
+=== Stratigraphic relationships ===
+The stratigraphy of an archaeological site can be used to date, or refine the date, of particular activities ("contexts") on that site. For example, if a context is sealed between two other contexts of known date, it can be inferred that the middle context must date to between those dates.
+
+== See also ==
+Astronomical chronology
+Age of Earth
+Age of the universe
+Geochronology
+Geologic time scale
+Geological history of Earth
+Archaeological science
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates-0.md b/data/en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates-0.md
new file mode 100644
index 000000000..999960798
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates-0.md
@@ -0,0 +1,31 @@
+---
+title: "Cloud seeding in the United Arab Emirates"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:33.076239+00:00"
+instance: "kb-cron"
+---
+
+Cloud seeding in the United Arab Emirates is a weather modification technique used by the government to address water challenges in the country. Cloud seeding is also referred to as man made precipitation and artificial rain making. The United Arab Emirates is one of the first countries in the Persian Gulf region to use cloud seeding technology. UAE scientists use cloud seeding technology to supplement the country's water insecurity, which stems from the extremely hot climate. They use weather radars to continuously monitor the atmosphere of the country. Forecasters and scientists have estimated that cloud seeding operations can enhance rainfall by as much as 30-35% percent in a clear atmosphere, and up to 10-15% in a more humid atmosphere. This practice has caused concerns regarding the impact on the environment because it is difficult to predict its long-term global implications.
+
+== Climate needs ==
+The UAE has an arid climate with less than 100mm per year of rainfall, a high evaporation rate of surface water and a low groundwater recharge rate. Rainfall in the UAE has been fluctuating over the last few decades in winter season between December and March.
+The climate of the UAE is very dry aside from the coast and the border of the UAE and Oman, where there is high humidity. The UAE is located in a dust hotspot that contributes to the arid climate. There is little to no rainfall, due to frontal systems from the west and northwest, which yield few inches of rainfall per year. This lack of rainfall has scientists and the government worried about water security in the future.
+Due to industrialization and population growth, the demand for water has rapidly increased. Current resources are being depleted and scarcity issues are arising as climate change increases evaporation rates, causing drought. As a result, the UAE is looking to cloud seeding technologies to increase water security as well as renewability to combat water and food scarcity that may arise. Research has predicted that drought frequencies and temperatures will continue increasing and cloud seeding hopes to provide an additional method of mitigation against future climate change.
+
+== History ==
+Scientists have been experimenting with cloud seeding technology since the 1940s. The cloud-seeding program in the UAE was initiated in the late 1990s, as one of the first Middle Eastern countries to utilize this technique. In 2005, the UAE launched the UAE Prize for Excellence in Advancing the Science and Practice of Weather Modification in collaboration with the World Meteorological Organization (WMO). In 2010, cloud seeding began as a project by weather authorities to create artificial rain. The project, which began in July 2010 and cost $11 million USD, succeeded in creating rain storms in the Dubai and Abu Dhabi deserts.
+
+=== Government involvement ===
+The UAE government developed a research program called the UAE Research Program for Rain Enhancement Science (UAEREP) in 2015. It allows scientists and researchers to pitch their potential solutions and conduct research to improve the accuracy of cloud seeding technology. After pitching research proposals, scientists are awarded grants through the UAEREP. Among its key goals are advancing the science, technology, and implementation of rain enhancement and encouraging additional investments in research funding and research partnerships to advance the field, increasing rainfall and ensuring water security globally. By early 2001, the UAEREP was conducting research projects in cooperation with the National Center for Atmospheric Research (NCAR) in the U.S., the Witwatersrand University in South Africa, the National Aeronautics and Space Agency (NASA) in the U.S.
+The Program for Rain Enhancement Science is an initiative of the United Arab Emirates Ministry of Presidential Affairs. It is overseen by the UAE National Center of Meteorology & Seismology (NCMS) based in Abu Dhabi.
+In 2014, a total of 187 missions were sent to seed clouds in the UAE with each aircraft taking about three hours to target five to six clouds at a cost of $3,000 per operation. In 2017, the UAE had 214 missions, and in 2018, it had 184 missions, and 247 missions were launched in 2019. Tests of new technologies were done in 2020 with partners in the United States to test the use of nanomaterials for seeding.
+
+== Technology ==
+
+The augmentation of rainfall considers both the ground-based and airborne processes that occur in different rain cloud types (but generally focused on convective clouds). The UAE utilizes operational aircraft-based and drone-controlled hygroscopic cloud seeding as opposed to conventional randomized aircraft seeding, as it does not take into consideration the varying properties of rain clouds, especially present in dusty and arid regions like the UAE. Since 2021, the devices have been equipped with a payload of electric-charge emission instruments and customized sensors that fly at low altitudes and deliver an electric charge to air molecules. Hygroscopic cloud seeding uses natural salts such as potassium chloride and sodium chloride that pre-exist in the atmosphere with hygroscopic flares. By introducing Hygroscopic particles, it enhances the natural rain particles which begins a collision-coalescence process.
+At present, the UAE mostly cloud seeds in the eastern mountains on the border to Oman to raise levels in aquifers and reservoirs. There are 75 networked automatic weather stations distributed across the country, 7 air quality stations, a Doppler weather radar network of five stationary and one mobile radar, and six Beechcraft King Air C90 aircraft distributed across the country for cloud seeding operations.
+
+== Environmental impact ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates-1.md b/data/en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates-1.md
new file mode 100644
index 000000000..591b11821
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates-1.md
@@ -0,0 +1,32 @@
+---
+title: "Cloud seeding in the United Arab Emirates"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Cloud_seeding_in_the_United_Arab_Emirates"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:33.076239+00:00"
+instance: "kb-cron"
+---
+
+=== Flooding ===
+It is predicted that climate change will lead to higher temperatures, increased humidity and a greater risk of flooding in parts of the Gulf region. These issues could be worsened in nations like the UAE which do not have adequate drainage infrastructure to manage heavy rainfall.
+Cloud seeding activities conducted in 2019 by the UAE National Center of Meteorology & Seismology (NCM) as part of the UAE Research Program for Rain Enhancement Science were carried out prior to floods in Dubai in 2019. Although the NCM has linked heavier rainfall to cloud seeding operations, they assert it was not the cause of the flooding. Commercial and residential areas were severely impacted and pumps were needed to remove excess water due to inadequate drainage systems because drainage systems could not handle the volume of water. The UAE planned to invest 500 million dirhams ($136.1 million) on flood protection and transport infrastructure after severe storms in 2020.
+Sharjah, one of the most populous cities in the UAE, has experienced repetitive urban flooding during the rainy season over the last three decades. Possible additional increased rainfall intensity due to cloud seeding would require additional investment in the city's drainage systems to mitigate flood risk.
+
+==== April 2024 floods ====
+Experts are doubtful that cloud seeding played a role in the UAE's April 2024 floods, suggesting that the heavy rainfall was more likely caused by anthropogenic climate change. 
+
+=== Atmospheric aerosols ===
+Cloud seeding missions require firing salts and silver iodide crystals into the atmosphere. The increased concentration of particulate matter, or micro-pollutants, increases risk for respiratory illnesses. In 2017, a study was conducted before and after cloud seeding missions, which recorded an increase of particulate matter, correlating to the months of active artificial rain. Researchers attribute this to left over silver iodine crystals that were not dispersed in the rain during the cloud seeding months. A study was conducted called the UAE Unified Aerosol Experiment (UAE2) to assess the progress and effectiveness of cloud seeding specifically in the UAE. Researchers found a significant increase in rainfall trends in areas with cloud seeding. More recently, over 20 regions in the UAE that participated in cloud seeding experiments have a higher concentration of particulate matter. The overall environmental impact of cloud seeding is difficult measure due to the inability to perform controlled experiments along with the difficulty in direct tracing.
+
+== See also ==
+Cloud seeding
+United Arab Emirates
+Environmental issues in the United Arab Emirates
+Arabian Desert
+Abu Dhabi
+Dubai Electricity and Water Authority
+Sharjah Electricity and Water Authority
+Particulates
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Consilience-0.md b/data/en.wikipedia.org/wiki/Consilience-0.md
index fa4139ab6..806169433 100644
--- a/data/en.wikipedia.org/wiki/Consilience-0.md
+++ b/data/en.wikipedia.org/wiki/Consilience-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/Consilience"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:07:48.342671+00:00"
+date_saved: "2026-05-05T09:55:57.901867+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Consilience-1.md b/data/en.wikipedia.org/wiki/Consilience-1.md
index 3c405a1f6..fee3b96f9 100644
--- a/data/en.wikipedia.org/wiki/Consilience-1.md
+++ b/data/en.wikipedia.org/wiki/Consilience-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/Consilience"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:07:48.342671+00:00"
+date_saved: "2026-05-05T09:55:57.901867+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Control_variable-0.md b/data/en.wikipedia.org/wiki/Control_variable-0.md
new file mode 100644
index 000000000..81c04e527
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Control_variable-0.md
@@ -0,0 +1,63 @@
+---
+title: "Control variable"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Control_variable"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:34.251757+00:00"
+instance: "kb-cron"
+---
+
+A control variable (or scientific constant) in scientific experimentation is an experimental element which is constant (controlled) and unchanged throughout the course of the investigation. Control variables could strongly influence experimental results were they not held constant during the experiment in order to test the relative relationship of the dependent variable (DV) and independent variable (IV). The control variables themselves are not of primary interest to the experimenter.
+"Good controls", also known as “confounders” or “deconfounders”, are variables which are theorized to be unaffected by the treatment and which are intended to eliminate omitted-variable bias. "Bad controls", on the other hand, are variables that could be affected by the treatment, might contribute to collider bias, and lead to erroneous results.
+
+
+== Usage ==
+A variable in an experiment which is held constant in order to assess the relationship between multiple variables, is a control variable. A control variable is an element that is not changed throughout an experiment because its unchanging state allows better understanding of the relationship between the other variables being tested.
+In any system existing in a natural state, many variables may be interdependent, with each affecting the other.  Scientific experiments test the relationship of an IV (or independent variable: that element that is manipulated by the experimenter) to the DV (or dependent variable: that element affected by the manipulation of the IV).  Any additional independent variable can be a control variable.  
+A control variable is an experimental condition or element that is kept the same throughout the experiment, and it is not of primary concern in the experiment, nor will it influence the outcome of the experiment. Any unexpected (e.g.: uncontrolled) change in a control variable during an experiment would invalidate the correlation of dependent variables (DV) to the independent variable (IV), thus skewing the results, and invalidating the working hypothesis. This indicates the presence of a spurious relationship existing within experimental parameters. Unexpected results may result from the presence of a confounding variable, thus requiring a re-working of the initial experimental hypothesis. Confounding variables are a threat to the internal validity of an experiment. This situation may be resolved by first identifying the confounding variable and then redesigning the experiment taking that information into consideration. One way to this is to control the confounding variable, thus making it a control variable. If, however, the spurious relationship cannot be identified, the working hypothesis may have to be abandoned.
+
+
+== Experimental examples ==
+Take, for example, the well known combined gas law, which is stated mathematically as:
+
+  
+    
+      
+        
+        
+          
+            
+              P
+              V
+            
+            T
+          
+        
+        =
+        k
+      
+    
+    {\displaystyle \qquad {\frac {PV}{T}}=k}
+  
+
+where:
+
+P is the pressure
+V is the volume
+T is the thermodynamic temperature measured in kelvins
+k is a constant (with units of energy divided by temperature).
+which shows that the ratio between the pressure-volume product and the temperature of a system remains constant.
+In an experimental verification of parts of the combined gas law, (P * V = T), where Pressure, Temperature, and Volume are all variables, to test the resultant changes to any of these variables requires at least one to be kept constant. This is in order to see comparable experimental results in the remaining variables. 
+If Temperature is made the control variable and it is not allowed to change throughout the course of the experiment, the relationship between the dependent variables, Pressure, and Volume, can quickly be established by changing the value for one or the other, and this is Boyle's law. For instance, if the Pressure is raised then the Volume must decrease.
+If, however, Volume is made the control variable and it is not allowed to change throughout the course of the experiment, the relationship between dependent variables, Pressure, and Temperature,  can quickly be established by changing the value for one or the other, and this is Gay-Lussac's law. For instance, if the Pressure is raised then the Temperature must increase.
+
+
+== Notes ==
+
+
+== References ==
+
+
+== External links ==
+Definitions; Science Buddies – Science Fair Projects.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Dendrochronology-0.md b/data/en.wikipedia.org/wiki/Dendrochronology-0.md
new file mode 100644
index 000000000..eea820a7d
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Dendrochronology-0.md
@@ -0,0 +1,25 @@
+---
+title: "Dendrochronology"
+chunk: 1/4
+source: "https://en.wikipedia.org/wiki/Dendrochronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:31.464391+00:00"
+instance: "kb-cron"
+---
+
+Dendrochronology  (or tree-ring dating) is the scientific method of dating tree rings (also called growth rings) to the exact year they were formed in a tree. As well as dating them, this can give data for dendroclimatology, the study of climate and atmospheric conditions during different periods in history from the wood of old trees. Dendrochronology derives from the Ancient Greek dendron (δένδρον), meaning "tree", khronos (χρόνος), meaning "time", and -logia (-λογία), "the study of".
+Dendrochronology is useful for determining the precise age of samples, especially those that are too recent for radiocarbon dating, which always produces a range rather than an exact date. However, for a precise date of the death of the tree a full sample to the edge is needed, which most trimmed timber will not provide.  It also gives data on the timing of events and rates of change in the environment (most prominently climate) and also in wood found in archaeology or works of art and architecture, such as old panel paintings. It is also used as a check in radiocarbon dating to calibrate radiocarbon ages.
+New growth in trees occurs in a layer of cells near the bark. A tree's growth rate changes in a predictable pattern throughout the year in response to seasonal climate changes, resulting in visible growth rings. Each ring marks a complete cycle of seasons, or one year, in the tree's life. As of 2023, securely dated tree-ring data for Germany, Bohemia and Ireland are available going back 13,910 years. A new method is based on measuring variations in oxygen isotopes in each ring, and this 'isotope dendrochronology' can yield results on samples which are not suitable for traditional dendrochronology due to too few or too similar rings. Some regions have "floating sequences", with gaps which mean that earlier periods can only be approximately dated. As of 2024, only three areas have continuous sequences going back to prehistoric times: the foothills of the Northern Alps, the southwestern United States, and the British Isles. Miyake events, which are major spikes in cosmic rays at known dates, are visible in trees rings and can fix the dating of a floating sequence.
+
+== History ==
+The Greek botanist Theophrastus (c. 371 – c. 287 BC) first mentioned that the wood of trees has rings. In his 1651 Trattato della Pittura (Treatise on Painting), Leonardo da Vinci (1452–1519) was the first person to mention that trees form rings annually and that their thickness is determined by the conditions under which they grew. In 1737, French investigators Henri-Louis Duhamel du Monceau and Georges-Louis Leclerc de Buffon examined the effect of growing conditions on the shape of tree rings. They found that in 1709, a severe winter produced a distinctly dark tree ring, which served as a reference for subsequent European naturalists. In the U.S., Alexander Catlin Twining (1801–1884) suggested in 1833 that patterns among tree rings could be used to synchronize the dendrochronology of various trees and thereby to reconstruct past climates across entire regions. The English polymath Charles Babbage proposed using dendrochronology to date the remains of trees in peat bogs or even in geological strata (1835, 1838).
+During the later half of the nineteenth century, the scientific study of tree rings and the application of dendrochronology began. In 1859, the German-American Jacob Kuechler (1823–1893) used crossdating to examine oaks (Quercus stellata) in order to study the record of climate in western Texas. In 1866, the German botanist, entomologist, and forester Julius Theodor Christian Ratzeburg (1801–1871) observed the effects on tree rings of defoliation caused by insect infestations. By 1882, this observation was already appearing in forestry textbooks. In the 1870s, the Dutch astronomer Jacobus Kapteyn (1851–1922) was using crossdating to reconstruct the climates of the Netherlands and Germany. In 1881, the Swiss-Austrian forester Arthur von Seckendorff-Gudent (1845–1886) was using crossdating. From 1869 to 1901, Robert Hartig (1839–1901), a German professor of forest pathology, wrote a series of papers on the anatomy and ecology of tree rings. In 1892, the Russian physicist Fedor Nikiforovich Shvedov (1841–1905) wrote that he had used patterns found in tree rings to predict droughts in 1882 and 1891.
+During the first half of the twentieth century, the astronomer A. E. Douglass founded the Laboratory of Tree-Ring Research at the University of Arizona. Douglass sought to better understand cycles of sunspot activity and reasoned that changes in solar activity would affect climate patterns on earth, which would subsequently be recorded by tree-ring growth patterns (i.e., sunspots → climate → tree rings).
+
+== Methods ==
+
+=== Growth rings ===
+
+Horizontal cross sections cut through the trunk of a tree can reveal growth rings, also referred to as tree rings or annual rings. Growth rings result from new growth in the vascular cambium, a layer of cells near the bark that botanists classify as a lateral meristem; this growth in diameter is known as secondary growth. Visible rings result from the change in growth speed through the seasons of the year; thus, critical for the title method, one ring generally marks the passage of one year in the life of the tree. Removal of the bark of the tree in a particular area may cause deformation of the rings as the plant overgrows the scar.
+The rings are more visible in trees which have grown in temperate zones, where the seasons differ more markedly. The inner portion of a growth ring forms early in the growing season, when growth is comparatively rapid (hence the wood is less dense) and is known as "early wood" (or "spring wood", or "late-spring wood"); the outer portion is the "late wood" (sometimes termed "summer wood", often being produced in the summer, though sometimes in the autumn) and is denser.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Dendrochronology-1.md b/data/en.wikipedia.org/wiki/Dendrochronology-1.md
new file mode 100644
index 000000000..7af209425
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Dendrochronology-1.md
@@ -0,0 +1,239 @@
+---
+title: "Dendrochronology"
+chunk: 2/4
+source: "https://en.wikipedia.org/wiki/Dendrochronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:31.464391+00:00"
+instance: "kb-cron"
+---
+
+Many trees in temperate zones produce one growth-ring each year, with the newest adjacent to the bark. Hence, for the entire period of a tree's life, a year-by-year record or ring pattern builds up that reflects the age of the tree and the climatic conditions in which the tree grew. Adequate moisture and a long growing season result in a wide ring, while a drought year may result in a very narrow one.
+Direct reading of tree ring chronologies is a complex science, for several reasons. First, contrary to the single-ring-per-year paradigm, alternating poor and favorable conditions, such as mid-summer droughts, can result in several rings forming in a given year. In addition, particular tree species may present "missing rings", and this influences the selection of trees for study of long time-spans. For instance, missing rings are rare in oak and elm trees.
+Critical to the science, trees from the same region tend to develop the same patterns of ring widths for a given period of chronological study. Researchers can compare and match these patterns ring-for-ring with patterns from trees which have grown at the same time in the same geographical zone (and therefore under similar climatic conditions). When one can match these tree-ring patterns across successive trees in the same locale, in overlapping fashion, chronologies can be built up—both for entire geographical regions and for sub-regions. Moreover, wood from ancient structures with known chronologies can be matched to the tree-ring data (a technique called 'cross-dating'), and the age of the wood can thereby be determined precisely. Dendrochronologists originally carried out cross-dating by visual inspection; more recently, they have harnessed computers to do the task, applying statistical techniques to assess the matching. To eliminate individual variations in tree-ring growth, dendrochronologists take the smoothed average of the tree-ring widths of multiple tree-samples to build up a 'ring history', a process termed replication. A tree-ring history whose beginning- and end-dates are not known is called a 'floating chronology'. It can be anchored by cross-matching a section against another chronology (tree-ring history) whose dates are known.
+A fully anchored and cross-matched chronology for oak and pine in central Europe extends back 12,460 years, and an oak chronology goes back 7,506 years in Bohemia, 7,429 years in Ireland and 6,939 years in England. Comparison of radiocarbon and dendrochronological ages supports the consistency of these two independent dendrochronological sequences. Another fully anchored chronology that extends back 8,500 years exists for the bristlecone pine in the Southwest US (White Mountains of California).
+
+=== Dendrochronological equation ===
+
+The dendrochronological equation defines the law of growth of tree rings.  The equation was proposed by Russian biophysicist Alexandr N. Tetearing in his work "Theory of populations" in the form:
+
+  
+    
+      
+        Δ
+        L
+        (
+        t
+        )
+        =
+        
+          
+            1
+            
+              
+                k
+                
+                  v
+                
+              
+              
+              
+                ρ
+                
+                  
+                    1
+                    3
+                  
+                
+              
+            
+          
+        
+        
+        
+          
+            
+              d
+              
+                (
+                
+                  
+                    M
+                    
+                      
+                        1
+                        3
+                      
+                    
+                  
+                  (
+                  t
+                  )
+                
+                )
+              
+            
+            
+              d
+              t
+            
+          
+        
+        ,
+      
+    
+    {\displaystyle \Delta L(t)={\frac {1}{k_{v}\,\rho ^{\frac {1}{3}}}}\,{\frac {d\left(M^{\frac {1}{3}}(t)\right)}{dt}},}
+  
+
+where ΔL is width of annual ring, t is time (in years), ρ is density of wood, kv is some coefficient, M(t) is function of mass growth of the tree.
+Ignoring the natural sinusoidal oscillations in tree mass, the formula for the changes in the annual ring width is:
+
+  
+    
+      
+        Δ
+        L
+        (
+        t
+        )
+        =
+        −
+        
+          
+            
+              
+                c
+                
+                  1
+                
+              
+              
+                e
+                
+                  −
+                  
+                    a
+                    
+                      1
+                    
+                  
+                  t
+                
+              
+              +
+              
+                c
+                
+                  2
+                
+              
+              
+                e
+                
+                  −
+                  
+                    a
+                    
+                      2
+                    
+                  
+                  t
+                
+              
+            
+            
+              3
+              
+                k
+                
+                  v
+                
+              
+              
+                ρ
+                
+                  
+                    1
+                    3
+                  
+                
+              
+              
+                
+                  (
+                  
+                    
+                      c
+                      
+                        4
+                      
+                    
+                    +
+                    
+                      c
+                      
+                        1
+                      
+                    
+                    
+                      e
+                      
+                        −
+                        
+                          a
+                          
+                            1
+                          
+                        
+                        t
+                      
+                    
+                    +
+                    
+                      c
+                      
+                        2
+                      
+                    
+                    
+                      e
+                      
+                        −
+                        
+                          a
+                          
+                            2
+                          
+                        
+                        t
+                      
+                    
+                  
+                  )
+                
+                
+                  
+                    2
+                    3
+                  
+                
+              
+            
+          
+        
+      
+    
+    {\displaystyle \Delta L(t)=-{\frac {c_{1}e^{-a_{1}t}+c_{2}e^{-a_{2}t}}{3k_{v}\rho ^{\frac {1}{3}}\left(c_{4}+c_{1}e^{-a_{1}t}+c_{2}e^{-a_{2}t}\right)^{\frac {2}{3}}}}}
+  
+
+where c1, c2, and c4 are some coefficients, a1 and a2 are positive constants.
+The formula is useful for correct approximation of samples data before data normalization procedure. The typical forms of the function ΔL(t) of annual growth of wood ring are  shown in the figures.
+
+=== Sampling and dating ===
+Dendrochronology allows specimens of once-living material to be accurately dated to a specific year. Dates are often represented as estimated calendar years B.P., for before present, where "present" refers to 1 January 1950.
+Timber core samples are sampled (often using an increment borer) and used to measure the width of annual growth rings; by taking samples from different sites within a particular region, researchers can build a comprehensive historical sequence. The techniques of dendrochronology are more consistent in areas where trees grew in marginal conditions such as aridity or semi-aridity where the ring growth is more sensitive to the environment, rather than in humid areas where tree-ring growth is more uniform (complacent). In addition, some genera of trees are more suitable than others for this type of analysis. For instance, the bristlecone pine is exceptionally long-lived and slow growing, and has been used extensively for chronologies; still-living and dead specimens of this species provide tree-ring patterns going back thousands of years, in some regions more than 10,000 years. Currently, the maximum span for fully anchored chronology is a little over 11,000 years B.P.
+IntCal20 is the 2020 "Radiocarbon Age Calibration Curve", which provides a calibrated carbon 14 dated sequence going back 55,000 years. The most recent part, going back 13,900 years, is based on tree rings.
+
+=== Reference sequences ===
+European chronologies derived from wooden structures initially found it difficult to bridge the gap in the fourteenth century when there was a building hiatus, which coincided with the Black Death. However, there do exist unbroken chronologies dating back to prehistoric times, for example the Danish chronology dating back to 352 BC.
+Given a sample of wood, the variation of the tree-ring growths not only provides a match by year, but can also match location because climate varies from place to place. This makes it possible to determine the source of ships as well as smaller artifacts made from wood, but which were transported long distances, such as panels for paintings and ship timbers.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Dendrochronology-2.md b/data/en.wikipedia.org/wiki/Dendrochronology-2.md
new file mode 100644
index 000000000..824a5d163
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Dendrochronology-2.md
@@ -0,0 +1,39 @@
+---
+title: "Dendrochronology"
+chunk: 3/4
+source: "https://en.wikipedia.org/wiki/Dendrochronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:31.464391+00:00"
+instance: "kb-cron"
+---
+
+=== Miyake events ===
+Miyake events, such as the ones in 774–775 and 993–994, can provide fixed reference points in an unknown time sequence as they are due to cosmic radiation. As they appear as spikes in carbon 14 in tree rings for that year all round the world, they can be used to date historical events to the year. For example, wooden houses in the Viking site at L'Anse aux Meadows in Newfoundland were dated by finding the layer with the 993 spike, which showed that the wood is from a tree felled in 1021. Researchers at the University of Bern have provided exact dating of a floating sequence in a Neolithic settlement in northern Greece by tying it to a spike in cosmogenic radiocarbon in 5259 BC.
+
+=== Frost rings ===
+Frost ring is a term used to designate a layer of deformed, collapsed tracheids and traumatic parenchyma cells in tree ring analysis. They are formed when air temperature falls below freezing during a period of cambial activity. They can be used in dendrochronology to indicate years that are colder than usual.
+
+== Applications ==
+
+=== Radiocarbon dating calibration ===
+Dates from dendrochronology can be used as a calibration and check of radiocarbon dating. This can be done by checking radiocarbon dates against long master sequences, with Californian bristle-cone pines in Arizona being used to develop this method of calibration as the longevity of the trees (up to c.4900 years) in addition to the use of dead samples meant a long, unbroken tree ring sequence could be developed (dating back to c. 6700 BC). Additional studies of European oak trees, such as the master sequence in Germany that dates back to c. 8500 BC, can also be used to back up and further calibrate radiocarbon dates.
+
+=== Climatology ===
+
+Dendroclimatology is the science of determining past climates from trees primarily from the properties of the annual tree rings. Other properties of the annual rings, such as maximum latewood density (MXD) have been shown to be better proxies than simple ring width. Using tree rings, scientists have estimated many local climates for hundreds to thousands of years previous.
+
+=== Art history ===
+Dendrochronology has become important to art historians in the dating of panel paintings. However, unlike analysis of samples from buildings, which are typically sent to a laboratory, wooden supports for paintings usually have to be measured in a museum conservation department, which places limitations on the techniques that can be used.
+In addition to dating, dendrochronology can also provide information as to the source of the panel. Many Early Netherlandish paintings have turned out to be painted on panels of "Baltic oak" shipped from the Vistula region via ports of the Hanseatic League. Oak panels were used in a number of northern countries such as England, France and Germany. Wooden supports other than oak were rarely used by Netherlandish painters.
+
+Since panels of seasoned wood were used, an uncertain number of years has to be allowed for seasoning when estimating dates. Panels were trimmed of the outer rings, and often each panel only uses a small part of the radius of the trunk. Consequently, dating studies usually result in a terminus post quem (earliest possible) date, and a tentative date for the arrival of a seasoned raw panel using assumptions as to these factors. As a result of establishing numerous sequences, it was possible to date 85–90% of the 250 paintings from the fourteenth to seventeenth century analysed between 1971 and 1982; by now a much greater number have been analysed.
+A portrait of Mary, Queen of Scots in the National Portrait Gallery, London was believed to be an eighteenth-century copy. However, dendrochronology revealed that the wood dated from the second half of the sixteenth century. It is now regarded as an original sixteenth-century painting by an unknown artist.
+On the other hand, dendrochronology was applied to four paintings depicting the same subject, that of Christ expelling the money-lenders from the Temple. The results showed that the age of the wood was too late for any of them to have been painted by Hieronymus Bosch.
+While dendrochronology has become an important tool for dating oak panels, it is not effective in dating the poplar panels often used by Italian painters because of the erratic growth rings in poplar.
+The sixteenth century saw a gradual replacement of wooden panels by canvas as the support for paintings, which means the technique is less often applicable to later paintings. In addition, many panel paintings were transferred onto canvas or other supports during the nineteenth and twentieth centuries.
+
+=== Archaeology ===
+
+The dating of buildings with wooden structures and components is also done by dendrochronology; dendroarchaeology is the term for the application of dendrochronology in archaeology. While archaeologists can date wood and when it was felled, it may be difficult to definitively determine the age of a building or structure in which the wood was used; the wood could have been reused from an older structure, may have been felled and left for many years before use, or could have been used to replace a damaged piece of wood. The dating of building via dendrochronology thus requires knowledge of the history of building technology.  Many prehistoric forms of buildings used "posts" that were whole young tree trunks; where the bottom of the post has survived in the ground these can be especially useful for dating.
+Examples:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Dendrochronology-3.md b/data/en.wikipedia.org/wiki/Dendrochronology-3.md
new file mode 100644
index 000000000..a484e450f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Dendrochronology-3.md
@@ -0,0 +1,53 @@
+---
+title: "Dendrochronology"
+chunk: 4/4
+source: "https://en.wikipedia.org/wiki/Dendrochronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:31.464391+00:00"
+instance: "kb-cron"
+---
+
+The Post Track and Sweet Track, ancient timber trackways in the Somerset levels, England, have been dated to 3838 BC and 3807 BC.
+Navan Fort where in Prehistoric Ireland a large structure was built with more than two hundred posts. The central oak post was felled in 95 BC.
+The Fairbanks House in Dedham, Massachusetts. While the house had long been claimed to have been built c. 1640 (and being the oldest wood-framed house in North America), core samples of wood taken from a summer beam confirmed the wood was from an oak tree felled in 1637–8, as wood was not seasoned before use in building at that time in New England. An additional sample from another beam yielded a date of 1641, thus confirming the house had been constructed starting in 1638 and finished sometime after 1641 .
+The burial chamber of Gorm the Old, who died c. 958, was constructed from wood of timbers felled in 958.
+Veliky Novgorod, where, between the tenth and the fifteenth century, numerous consecutive layers of wooden log pavement have been placed over the accumulating dirt.
+The Neolithic well with linings made of oak wood, found near Ostrov, Czech Republic, have been dated to 5,482-5,243 BC.
+
+== Measurement platforms, software, and data formats ==
+There are many different file formats used to store tree ring width data. Effort for standardisation was made with the development of TRiDaS. Further development led to the database software Tellervo, which is based on the new standard format whilst being able to import lots of different data formats. The desktop application can be attached to measurement devices and works with the database server that is installed separately.
+
+== Continuous sequence ==
+Bard et al write in 2023: "The oldest tree-ring series are known as floating since, while their constituent rings can be counted to create a relative internal chronology, they cannot be dendro-matched with the main Holocene absolute chronology. However, 14C analyses performed at high resolution on overlapped absolute and floating tree-rings series enable one to link them almost absolutely and hence to extend the calibration on annual tree rings until ≈13 900 cal yr BP."
+Some of the longest tree-ring timelines, especially those extending earlier than about 4000 BC, are joined by comparing how similar the ring patterns look instead of by directly overlapping pieces of wood from different trees. In those sections only a few samples match, so the connection is less certain.
+
+== Related chronologies ==
+
+Herbchronology is the analysis of annual growth rings (or simply annual rings) in the secondary root xylem of perennial herbaceous plants. Similar seasonal patterns also occur in ice cores and in varves (layers of sediment deposition in a lake, river, or sea bed). The deposition pattern in the core will vary for a frozen-over lake versus an ice-free lake, and with the fineness of the sediment. Sclerochronology is the study of algae deposits.
+Some columnar cacti also exhibit similar seasonal patterns in the isotopes of carbon and oxygen in their spines (acanthochronology). These are used for dating in a manner similar to dendrochronology, and such techniques are used in combination with dendrochronology, to plug gaps and to extend the range of the seasonal data available to archaeologists and paleoclimatologists.
+A similar technique is used to estimate the age of fish stocks through the analysis of growth rings in the otolith bones.
+
+== See also ==
+
+Dendrology
+International Tree-Ring Data Bank
+Post excavation
+Timeline of dendrochronology timestamp events
+
+== References ==
+
+== External links ==
+
+Nottingham Tree-Ring Dating Laboratory
+Oxford Tree-Ring Laboratory
+Dendrochronology and Art History of Painted Ceilings (Historic Environment Scotland, 2017).
+Video & commentary on medullary rays, heart wood, and tree rings.
+Video & commentary on Tree Rings – Formation and Purpose
+Bibliography of Dendrochronology
+Multilingual Glossary of Dendrochronology
+Digital Collaboratory for Cultural Dendrochronology (DCCD)
+International Tree-Ring Data Bank
+Laboratory of Tree-Ring Research University of Arizona
+"Tree Ring Science", the academic site of Prof. Henri D. Grissino-Mayer, Department of Geography, The University of Tennessee, and the Laboratory of Tree-Ring Science
+Briand, Christopher H.; Brazer, Susan E.; Harter-Dennis, Jeannine M. (December 2006). "Tree Rings and the Aging of Trees: A Controversy in 19th Century America". Tree-Ring Research. 62 (2): 51–65. doi:10.3959/1536-1098-62.2.51. hdl:10150/262645. S2CID 162884050.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Eddington_experiment-0.md b/data/en.wikipedia.org/wiki/Eddington_experiment-0.md
index e251aff3e..2ea910c13 100644
--- a/data/en.wikipedia.org/wiki/Eddington_experiment-0.md
+++ b/data/en.wikipedia.org/wiki/Eddington_experiment-0.md
@@ -4,7 +4,7 @@ chunk: 1/5
 source: "https://en.wikipedia.org/wiki/Eddington_experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:12:45.715901+00:00"
+date_saved: "2026-05-05T09:56:35.485516+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Eddington_experiment-1.md b/data/en.wikipedia.org/wiki/Eddington_experiment-1.md
index d78236826..7738b7164 100644
--- a/data/en.wikipedia.org/wiki/Eddington_experiment-1.md
+++ b/data/en.wikipedia.org/wiki/Eddington_experiment-1.md
@@ -4,7 +4,7 @@ chunk: 2/5
 source: "https://en.wikipedia.org/wiki/Eddington_experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:12:45.715901+00:00"
+date_saved: "2026-05-05T09:56:35.485516+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Eddington_experiment-2.md b/data/en.wikipedia.org/wiki/Eddington_experiment-2.md
index ddf503cd5..66bf818f4 100644
--- a/data/en.wikipedia.org/wiki/Eddington_experiment-2.md
+++ b/data/en.wikipedia.org/wiki/Eddington_experiment-2.md
@@ -4,7 +4,7 @@ chunk: 3/5
 source: "https://en.wikipedia.org/wiki/Eddington_experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:12:45.715901+00:00"
+date_saved: "2026-05-05T09:56:35.485516+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Eddington_experiment-3.md b/data/en.wikipedia.org/wiki/Eddington_experiment-3.md
index 9394a9d25..83182a171 100644
--- a/data/en.wikipedia.org/wiki/Eddington_experiment-3.md
+++ b/data/en.wikipedia.org/wiki/Eddington_experiment-3.md
@@ -4,7 +4,7 @@ chunk: 4/5
 source: "https://en.wikipedia.org/wiki/Eddington_experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:12:45.715901+00:00"
+date_saved: "2026-05-05T09:56:35.485516+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Eddington_experiment-4.md b/data/en.wikipedia.org/wiki/Eddington_experiment-4.md
index 75474cacd..9adecda6e 100644
--- a/data/en.wikipedia.org/wiki/Eddington_experiment-4.md
+++ b/data/en.wikipedia.org/wiki/Eddington_experiment-4.md
@@ -4,7 +4,7 @@ chunk: 5/5
 source: "https://en.wikipedia.org/wiki/Eddington_experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:12:45.715901+00:00"
+date_saved: "2026-05-05T09:56:35.485516+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence-0.md b/data/en.wikipedia.org/wiki/Empirical_evidence-0.md
new file mode 100644
index 000000000..f935276b3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence-0.md
@@ -0,0 +1,20 @@
+---
+title: "Empirical evidence"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Empirical_evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:49.542435+00:00"
+instance: "kb-cron"
+---
+
+Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law.
+There is no general agreement on how the terms evidence and empirical are to be defined. Often different fields work with quite different conceptions. In epistemology, evidence is what justifies beliefs or what determines whether holding a certain belief is rational. This is only possible if the evidence is possessed by the person, which has prompted various epistemologists to conceive evidence as private mental states like experiences or other beliefs. In philosophy of science, on the other hand, evidence is understood as that which confirms or disconfirms scientific hypotheses and arbitrates between competing theories. For this role, evidence must be public and uncontroversial, like observable physical objects or events and unlike private mental states, so that evidence may foster scientific consensus. The term empirical comes from Greek ἐμπειρία empeiría, i.e. 'experience'. In this context, it is usually understood as what is observable, in contrast to unobservable or theoretical objects. It is generally accepted that unaided perception constitutes observation, but it is disputed to what extent objects accessible only to aided perception, like bacteria seen through a microscope or positrons detected in a cloud chamber, should be regarded as observable.
+Empirical evidence is essential to a posteriori knowledge or empirical knowledge, knowledge whose justification or falsification depends on experience or experiment. A priori knowledge, on the other hand, is seen either as innate or as justified by rational intuition and therefore as not dependent on empirical evidence. Rationalism fully accepts that there is knowledge a priori, which is either outright rejected by empiricism or accepted only in a restricted way as knowledge of relations between our concepts but not as pertaining to the external world.
+Scientific evidence is closely related to empirical evidence but not all forms of empirical evidence meet the standards dictated by scientific methods. Sources of empirical evidence are sometimes divided into observation and experimentation, the difference being that only experimentation involves manipulation or intervention: phenomena are actively created instead of being passively observed.
+
+== Background ==
+
+The concept of evidence is of central importance in epistemology and in philosophy of science but plays different roles in these two fields. In epistemology, evidence is what justifies beliefs or what determines whether holding a certain doxastic attitude is rational. For example, the olfactory experience of smelling smoke justifies or makes it rational to hold the belief that something is burning. It is usually held that for justification to work, the evidence has to be possessed by the believer. The most straightforward way to account for this type of evidence possession is to hold that evidence consists of the private mental states possessed by the believer.
+Some philosophers restrict evidence even further, for example, to only conscious, propositional or factive mental states. Restricting evidence to conscious mental states has the implausible consequence that many simple everyday beliefs would be unjustified. This is why it is more common to hold that all kinds of mental states, including stored but currently unconscious beliefs, can act as evidence. Various of the roles played by evidence in reasoning, for example, in explanatory, probabilistic and deductive reasoning, suggest that evidence has to be propositional in nature, i.e. that it is correctly expressed by propositional attitude verbs like "believe" together with a that-clause, like "that something is burning". But it runs counter to the common practice of treating non-propositional sense-experiences, like bodily pains, as evidence. Its defenders sometimes combine it with the view that evidence has to be factive, i.e. that only attitudes towards true propositions constitute evidence. In this view, there is no misleading evidence. The olfactory experience of smoke would count as evidence if it was produced by a fire but not if it was produced by a smoke generator. This position has problems in explaining why it is still rational for the subject to believe that there is a fire even though the olfactory experience cannot be considered evidence.
+In philosophy of science, evidence is understood as that which confirms or disconfirms scientific hypotheses and arbitrates between competing theories. Measurements of Mercury's "anomalous" orbit, for example, constitute evidence that plays the role of neutral arbiter between Newton's and Einstein's theory of gravitation by confirming Einstein's theory. For scientific consensus, it is central that evidence is public and uncontroversial, like observable physical objects or events and unlike private mental states. This way it can act as a shared ground for proponents of competing theories. Two issues threatening this role are the problem of underdetermination and theory-ladenness. The problem of underdetermination concerns the fact that the available evidence often provides equal support to either theory and therefore cannot arbitrate between them. Theory-ladenness refers to the idea that evidence already includes theoretical assumptions. These assumptions can hinder it from acting as neutral arbiter. It can also lead to a lack of shared evidence if different scientists do not share these assumptions. Thomas Kuhn is an important advocate of the position that theory-ladenness concerning scientific paradigms plays a central role in science.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence-1.md b/data/en.wikipedia.org/wiki/Empirical_evidence-1.md
new file mode 100644
index 000000000..6fb455640
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence-1.md
@@ -0,0 +1,21 @@
+---
+title: "Empirical evidence"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Empirical_evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:49.542435+00:00"
+instance: "kb-cron"
+---
+
+== Definition ==
+A thing is evidence for a proposition if it epistemically supports this proposition or indicates that the supported proposition is true. Evidence is empirical if it is constituted by or accessible to sensory experience. There are various competing theories about the exact definition of the terms evidence and empirical. Different fields, like epistemology, the sciences or legal systems, often associate different concepts with these terms. An important distinction among theories of evidence is whether they identify evidence with private mental states or with public physical objects. Concerning the term empirical, there is a dispute about where to draw the line between observable or empirical objects in contrast to unobservable or merely theoretical objects.
+The traditional view proposes that evidence is empirical if it is constituted by or accessible to sensory experience. This involves experiences arising from the stimulation of the sense organs, like visual or auditory experiences, but the term is often used in a wider sense including memories and introspection. It is usually seen as excluding purely intellectual experiences, like rational insights or intuitions used to justify basic logical or mathematical principles. The terms empirical and observable are closely related and sometimes used as synonyms.
+There is an active debate in contemporary philosophy of science as to what should be regarded as observable or empirical in contrast to unobservable or merely theoretical objects. There is general consensus that everyday objects like books or houses are observable since they are accessible via unaided perception, but disagreement starts for objects that are only accessible through aided perception. This includes using telescopes to study distant galaxies, microscopes to study bacteria or using cloud chambers to study positrons. So the question is whether distant galaxies, bacteria or positrons should be regarded as observable or merely theoretical objects. Some even hold that any measurement process of an entity should be considered an observation of this entity. In this sense, the interior of the Sun is observable since neutrinos originating there can be detected. The difficulty with this debate is that there is a continuity of cases going from looking at something with the naked eye, through a window, through a pair of glasses, through a microscope, etc. Because of this continuity, drawing the line between any two adjacent cases seems to be arbitrary. One way to avoid these difficulties is to hold that it is a mistake to identify the empirical with what is observable or sensible. Instead, it has been suggested that empirical evidence can include unobservable entities as long as they are detectable through suitable measurements. A problem with this approach is that it is rather far from the original meaning of "empirical", which contains the reference to experience.
+
+== Related concepts ==
+
+=== Knowledge a posteriori and a priori ===
+
+Knowledge or the justification of a belief is said to be a posteriori if it is based on empirical evidence. A posteriori refers to what depends on experience (what comes after experience), in contrast to a priori, which stands for what is independent of experience (what comes before experience). For example, the proposition that "all bachelors are unmarried" is knowable a priori since its truth only depends on the meanings of the words used in the expression. The proposition "some bachelors are happy", on the other hand, is only knowable a posteriori since it depends on experience of the world as its justifier. Immanuel Kant held that the difference between a posteriori and a priori is tantamount to the distinction between empirical and non-empirical knowledge.
+Two central questions for this distinction concern the relevant sense of "experience" and of "dependence". The paradigmatic justification of knowledge a posteriori consists in sensory experience, but other mental phenomena, like memory or introspection, are also usually included in it. But purely intellectual experiences, like rational insights or intuitions used to justify basic logical or mathematical principles, are normally excluded from it. There are different senses in which knowledge may be said to depend on experience. In order to know a proposition, the subject has to be able to entertain this proposition, i.e. possess the relevant concepts. For example, experience is necessary to entertain the proposition "if something is red all over then it is not green all over" because the terms "red" and "green" have to be acquired this way. But the sense of dependence most relevant to empirical evidence concerns the status of justification of a belief. So experience may be needed to acquire the relevant concepts in the example above, but once these concepts are possessed, no further experience providing empirical evidence is needed to know that the proposition is true, which is why it is considered to be justified a priori.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence-2.md b/data/en.wikipedia.org/wiki/Empirical_evidence-2.md
new file mode 100644
index 000000000..2900170a3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence-2.md
@@ -0,0 +1,37 @@
+---
+title: "Empirical evidence"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Empirical_evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:49.542435+00:00"
+instance: "kb-cron"
+---
+
+=== Empiricism and rationalism ===
+In its strictest sense, empiricism is the view that all knowledge is based on experience or that all epistemic justification arises from empirical evidence. This stands in contrast to the rationalist view, which holds that some knowledge is independent of experience, either because it is innate or because it is justified by reason or rational reflection alone. Expressed through the distinction between knowledge a priori and a posteriori from the previous section, rationalism affirms that there is knowledge a priori, which is denied by empiricism in this strict form. One difficulty for empiricists is to account for the justification of knowledge pertaining to fields like mathematics and logic, for example, that 3 is a prime number or that modus ponens is a valid form of deduction. The difficulty is due to the fact that there seems to be no good candidate of empirical evidence that could justify these beliefs. Such cases have prompted empiricists to allow for certain forms of knowledge a priori, for example, concerning tautologies or relations between our concepts. These concessions preserve the spirit of empiricism insofar as the restriction to experience still applies to knowledge about the external world. In some fields, like metaphysics or ethics, the choice between empiricism and rationalism makes a difference not just for how a given claim is justified but for whether it is justified at all. This is best exemplified in metaphysics, where empiricists tend to take a skeptical position, thereby denying the existence of metaphysical knowledge, while rationalists seek justification for metaphysical claims in metaphysical intuitions.
+
+=== Scientific evidence ===
+
+Scientific evidence is closely related to empirical evidence. Some theorists, like Carlos Santana, have argued that there is a sense in which not all empirical evidence constitutes scientific evidence. One reason for this is that the standards or criteria that scientists apply to evidence exclude certain evidence that is legitimate in other contexts. For example, anecdotal evidence from a friend about how to treat a certain disease constitutes empirical evidence that this treatment works but would not be considered scientific evidence. Others have argued that the traditional empiricist definition of empirical evidence as perceptual evidence is too narrow for much of scientific practice, which uses evidence from various kinds of non-perceptual equipment.
+Central to scientific evidence is that it was arrived at by following scientific method in the context of some scientific theory. But people rely on various forms of empirical evidence in their everyday lives that have not been obtained this way and therefore do not qualify as scientific evidence. One problem with non-scientific evidence is that it is less reliable, for example, due to cognitive biases like the anchoring effect, in which information obtained earlier is given more weight, although science done poorly is also subject to such biases, as in the example of p-hacking.
+
+=== Observation, experimentation and scientific method ===
+In the philosophy of science, it is sometimes held that there are two sources of empirical evidence: observation and experimentation.  The idea behind this distinction is that only experimentation involves manipulation or intervention: phenomena are actively created instead of being passively observed. For example, inserting viral DNA into a bacterium is a form of experimentation while studying planetary orbits through a telescope belongs to mere observation. In these cases, the mutated DNA was actively produced by the biologist while the planetary orbits are independent of the astronomer observing them. Applied to the history of science, it is sometimes held that ancient science is mainly observational while the emphasis on experimentation is only present in modern science and responsible for the Scientific Revolution. This is sometimes phrased through the expression that modern science actively "puts questions to nature". This distinction also underlies the categorization of sciences into experimental sciences, like physics, and observational sciences, like astronomy. While the distinction is relatively intuitive in paradigmatic cases, it has proven difficult to give a general definition of "intervention" applying to all cases, which is why it is sometimes outright rejected.
+Empirical evidence is required for a hypothesis to gain acceptance in the scientific community. Normally, this validation is achieved by the scientific method of forming a hypothesis, experimental design, peer review, reproduction of results, conference presentation, and journal publication. This requires rigorous communication of hypothesis (usually expressed in mathematics), experimental constraints and controls (expressed in terms of standard experimental apparatus), and a common understanding of measurement. In the scientific context, the term semi-empirical is used for qualifying theoretical methods that use, in part, basic axioms or postulated scientific laws and experimental results. Such methods are opposed to theoretical ab initio methods, which are purely deductive and based on first principles.  Typical examples of both ab initio and semi-empirical methods can be found in computational chemistry.
+
+== See also ==
+
+== Footnotes ==
+
+== References ==
+Bird, Alexander (2013). "Thomas Kuhn". In Zalta, Edward N. (ed.). Stanford Encyclopedia of Philosophy. Section 4.2 Perception, Observational Incommensurability, and World-Change. Retrieved 25 January 2012.
+Craig, Edward (2005). "a posteriori". The Shorter Routledge Encyclopedia of Philosophy. Routledge. ISBN 978-0415324953.
+Feldman, Richard (2001) [1999]. "Evidence". In Audi, Robert (ed.). The Cambridge Dictionary of Philosophy (2nd ed.). Cambridge, UK: Cambridge University Press. pp. 293–294. ISBN 978-0521637220.
+Kuhn, Thomas S. (1970) [1962]. The Structure of Scientific Revolutions (2nd ed.). Chicago: University of Chicago Press. ISBN 978-0226458045.
+Pickett, Joseph P., ed. (2011). The American Heritage Dictionary of the English Language (5th ed.). Houghton Mifflin. ISBN 978-0-547-04101-8.
+
+== External links ==
+ The dictionary definition of empirical at Wiktionary
+ The dictionary definition of evidence at Wiktionary
+Fieser, James; Dowden, Bradley (eds.). "A Priori and A Posteriori". Internet Encyclopedia of Philosophy. ISSN 2161-0002. OCLC 37741658.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-0.md b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-0.md
new file mode 100644
index 000000000..d7af29fc9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-0.md
@@ -0,0 +1,29 @@
+---
+title: "Empirical evidence for the spherical shape of Earth"
+chunk: 1/7
+source: "https://en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:59.111630+00:00"
+instance: "kb-cron"
+---
+
+The roughly spherical shape of Earth can be empirically evidenced by many different types of observation, ranging from ground level, flight, or orbit. The spherical shape causes a number of effects and phenomena that when combined disprove flat Earth beliefs.
+These include the visibility of distant objects on Earth's surface; lunar eclipses; appearance of the Moon; observation of the sky from a certain altitude; observation of certain fixed stars from different locations; observing the Sun; surface navigation; grid distortion on a spherical surface; weather systems; gravity; and modern technology.
+
+== Visibility of distant objects on Earth's surface ==
+
+On a completely flat Earth without obstructions (mountains, hills, valleys or volcanos), the ground itself would never obscure distant objects. A spherical surface has a horizon which is closer when viewed from a lower altitude. In theory, a person standing on the surface with eyes 1.8 metres (5 ft 11 in) above the ground can see the ground up to about 4.79 kilometres (2.98 mi) away, but a person at the top of the Eiffel Tower at 273 metres (896 ft) can see the ground up to about 58.98 kilometres (36.65 mi) away. 
+This phenomenon permits a way of confirming that Earth's surface is locally convex: If the degree of curvature is determined to be the same everywhere on Earth's surface, and that surface was determined to be large enough, the constant curvature would show that Earth is spherical. In practice, this method is not reliable because of variations in atmospheric refraction, which is how much the atmosphere bends light traveling through it. Refraction can give the impression that Earth's surface is flat, curved more convexly than it is, or even that it is concave (this is what happened in various trials of the Bedford Level experiment).
+The phenomenon of variable atmospheric bending can be seen when distant objects appear to be broken into pieces or even turned upside down. This is often seen at sunset, when the Sun's shape is distorted, but has also been photographed happening to ships, and has caused the city of Chicago to appear normally, upside down, and broken into pieces from across Lake Michigan (from where it is normally below the horizon).
+
+When the atmosphere is relatively well-mixed, the visual effects generally expected of a spherical Earth can be observed. For example, ships travelling on large bodies of water (such as the ocean) disappear over the horizon progressively, such that the highest part of the ship can still be seen even when lower parts cannot, proportional to distance from the observer. Likewise, in the days of sailing ships, a sailor would climb up a mast to see farther. The same is true of the coastline or mountain when viewed from a ship or from across a large lake or flat terrain. In certain places, the curvature is visible via fixed objects. This includes the 23-mile (37 km) Lake Pontchartrain Causeway visible from a Metairie hotel, and the 85 pylons carrying 15 miles (24 km) of powerlines over Lake Pontchartrain, visible from I-10 Bonnet Carré Spillway Bridge.
+
+== Lunar eclipses ==
+
+The shadow of Earth on the Moon during a lunar eclipse is always a dark circle that moves from one side of the Moon to the other (partially grazing it during a partial eclipse). The only shape that casts a round shadow no matter which direction it is pointed is a sphere, and the ancient Greeks deduced that this must mean Earth is spherical.
+The effect could be produced by a disk that always faces the Moon head-on during the eclipse, but this is inconsistent with the fact that the Moon is only rarely directly overhead during an eclipse. For each eclipse, the local surface of Earth is pointed in a different direction. The shadow of a disk held at an angle is an oval, not a circle as is seen during the eclipse. The idea of Earth being a disk is also inconsistent with the fact that a given lunar eclipse is only visible from half of Earth at a time.
+
+== Appearance of the Moon ==
+
+The Moon's tidal lock to Earth results in the Moon's always showing only one side to Earth (see animated image). If Earth were flat, with the Moon hovering above it, then the portion of the Moon's surface visible to people on Earth would vary according to location on Earth, rather than showing an identical "face side" to everyone. If Earth were flat, with the Moon revolving around it tidally locked, then the Moon would be seen simultaneously at all places on Earth at once, but its apparent size, the portion facing the viewer, and facing side's orientation would gradually change for each viewer as its position moved across the sky over the course of the night.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-1.md b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-1.md
new file mode 100644
index 000000000..2a141edf8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-1.md
@@ -0,0 +1,26 @@
+---
+title: "Empirical evidence for the spherical shape of Earth"
+chunk: 2/7
+source: "https://en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:59.111630+00:00"
+instance: "kb-cron"
+---
+
+== Observation of the sky from altitude with the aid of a diagram ==
+On a perfectly spherical Earth, not considering obstructions and atmospheric refraction, its surface blocks almost half the sky for an observer close against the surface (see horizon). Moving away from the surface of Earth means that the ground blocks less and less of the sky. For example, when viewed from the Moon, Earth blocks only a small portion of the sky because it is so distant. This effect of geometry means that, when viewed from a high mountain, flat ground or ocean blocks less than a hemisphere of the sky. With the presumption of a spherical Earth, an expedition commissioned by caliph al-Ma'mun used this fact to calculate Earth's circumference to within 7,920 kilometres (4,920 mi) of the correct value of around 40,000 kilometres (25,000 mi), and possibly as accurately as 180 kilometres (110 mi). 
+The rate of change in the angle blocked by Earth as altitude increases would be different for a disk than for a sphere. The amount of surface blocked would be different for a mountain close to the edge of a flat Earth compared to a mountain in the middle of a flat Earth, but this is not observed. Surveys from all over Earth show that its shape is everywhere locally convex, confirming that it is very close to spherical.
+
+== Observation of fixed stars from different locations ==
+The fixed stars, for example the Pole Star (Polaris), can be demonstrated to be very far away by diurnal parallax measurements. Such measurements show no shifts in the stars' positions. Unlike the Sun, Moon, and planets, they do not change position with respect to one another over human lifetimes; the shapes of the constellations are constant. This makes them a convenient reference background for determining the shape of Earth. Adding distance measurements on the ground allows calculation of Earth's size.
+The fact that different stars are visible from different locations on Earth was noticed in ancient times. Aristotle wrote that some stars are visible from Egypt which are not visible from Europe. This would not be possible if Earth was flat.
+A star has an altitude above the horizon for an observer if the star is visible. Observing the same star at the same time from two different latitudes gives two different altitudes. Using geometry, the two altitudes along with the distance between the two locations allows for a calculation of Earth's size. Using observations of the star Canopus at Rhodes (in Greece) and Alexandria (in Egypt) and the distance between them, the Ancient Greek philosopher Posidonius used this technique to calculate the circumference of the planet to within perhaps 4% of the correct value. Modern equivalents of his units of measure are not precisely known, so it is not clear how accurate his measurement was. 
+The Andalusian astronomer Ibn Rushd went to Marrakesh (in Morocco) to observe the same star in 1153, as it was invisible in his native Córdoba, Al-Andalus. He used the different visibility in different latitudes to argue that the Earth is round, following Aristotle's argument.
+
+=== Observation of constellations on North and South hemispheres at different seasons ===
+The North Pole is in continuous night for six months of the year. The star Polaris (the "North Star") is almost directly overhead and therefore at the center of this rotation. Some of the 88 modern constellations visible are Ursa Major (including the Big Dipper), Cassiopeia, and Andromeda. The other six months of the year, the North Pole is in continuous daylight, with light from the Sun blotting out the stars. This phenomenon, and its analogous effects at the South Pole, are what defines the two poles. More than 24 hours of continuous daylight can only occur north of the Arctic Circle and south of the Antarctic Circle.)
+At the South Pole, a completely different set of constellations are visible during the six months of continuous night, including Crux, and Centaurus. This 180° hemisphere of stars rotates clockwise once every 24 hours around a point directly overhead.
+From any point on the equator, all of the stars visible anywhere on Earth on that day are visible at some time during the year as the sky rotates around a line drawn from due north to due south. When facing east, the stars visible from the north pole are on the left, and the stars visible from the south pole are on the right.
+The direction any intermediate spot on Earth is facing can also be calculated by measuring the angles of the fixed stars and determining how much of the sky is visible. For example, New York City is about 40° north of the equator. The apparent motion of the Sun blots out slightly different parts of the sky from day to day, but over the course of the entire year it sees a dome of 280° (360° - 80°). So for example, both Orion and the Big Dipper are visible during at least part of the year.
+Making stellar observations from a representative set of points across Earth, combined with knowing the shortest on-the-ground distance between any two given points, makes an approximate sphere the only possible shape for Earth.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-2.md b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-2.md
new file mode 100644
index 000000000..a8a618c08
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-2.md
@@ -0,0 +1,26 @@
+---
+title: "Empirical evidence for the spherical shape of Earth"
+chunk: 3/7
+source: "https://en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:59.111630+00:00"
+instance: "kb-cron"
+---
+
+== Observing the Sun ==
+On a flat Earth, a Sun that shines in all directions would illuminate the entire surface at the same time, and all places would experience sunrise and sunset at the horizon at about the same time. With a spherical Earth, half the planet is in daylight at any given time and the other half experiences nighttime. When a given location on the spherical Earth is in sunlight, its antipode – the location exactly on the opposite side of Earth – is in darkness. The spherical shape of Earth causes the Sun to rise and set at different times in different places, and different locations get different amounts of sunlight each day.
+In order to explain day and night, time zones, and the seasons, some flat Earth theorists propose that the Sun does not emit light in all directions, but acts more like a spotlight, only illuminating part of the flat Earth at a time. This conjecture is not consistent with observation: At sunrise and sunset, a spotlight Sun would be up in the sky at least a little bit, rather than at the horizon where it is always actually observed. A spotlight Sun would also appear at different angles in the sky with respect to a flat ground than it does with respect to a curved ground. Assuming light travels in straight lines, actual measurements of the Sun's angle in the sky from locations very distant from each other are only consistent with a geometry where the Sun is very far away and is being seen from the daylight half of a spherical Earth. These two phenomena are related: A low-altitude spotlight Sun would spend most of the day near the horizon for most locations on Earth, which is not observed, but rise and set fairly close to the horizon. A high-altitude Sun would spend more of the day away from the horizon, but rise and set fairly far from the horizon, which is also not observed.
+
+=== Changing length of the day ===
+
+On a flat Earth with an omnidirectional Sun, all places would experience the same amount of daylight every day, and all places would get daylight at the same time. Actual day length varies considerably, with places closer to the poles getting very long days in the summer and very short days in the winter, with northerly summer happening at the same time as southerly winter, and vice versa. Places north of the Arctic Circle and south of the Antarctic Circle get no sunlight for at least one day a year, and get 24-hour sunlight for at least one day a year. Both the poles experience sunlight for 6 months and darkness for 6 months, at opposite times.
+The movement of daylight between the northern and southern hemispheres happens because of the axial tilt of Earth. The imaginary line around which Earth spins, which goes between the North Pole and South Pole, is tilted about 23° from the oval that describes its orbit around the Sun. Earth always points in the same direction as it moves around the Sun, so for half the year (summer in the Northern Hemisphere), the North Pole is pointed slightly toward the Sun, keeping it in daylight all the time because the Sun lights up the half of Earth that is facing it (and the North Pole is always in that half due to the tilt). For the other half of the orbit, the South Pole is tilted slightly toward the Sun, and it is winter in the Northern Hemisphere. This means that at the equator, the Sun is not directly overhead at noon, except around the March and September equinoxes, when one spot on the equator is pointed directly at the Sun.
+
+=== Length of the day beyond polar circles ===
+The length of the day varies because as Earth rotates, some places (near the poles) pass through only a short curve near the top or bottom of the sunlight half; other places (near the equator) travel along much longer curves through the middle. In locations just outside the polar circles, there are so-called "white nights" in the middle of summer, in which the sun is never more than a few degrees below the horizon in June such that a bright twilight persists from sunset to sunrise. In Russia, Saint Petersburg uses this phenomenon in its tourist marketing.
+
+=== Length of the twilight ===
+Longer twilights are observed at higher latitudes (near the poles) due to a shallower angle of the Sun's apparent movement compared to the horizon. On a flat Earth, the Sun's shadow would reach the upper atmosphere very quickly, except near the closest edge of Earth, and would always set at the same angle to the ground (which is not what is observed).
+The length of twilight would be very different on a flat Earth. On a round Earth, the atmosphere above the ground is lit for a while before sunrise and after sunset are observed at ground level, because the Sun is still visible from higher altitudes.
+The "spotlight Sun" conjecture is also not consistent with this observation, since the air cannot be lit without the ground below it also being lit (except for shadows of mountains, hi-rises and other surface obstacles).
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-3.md b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-3.md
new file mode 100644
index 000000000..23274ab0c
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-3.md
@@ -0,0 +1,27 @@
+---
+title: "Empirical evidence for the spherical shape of Earth"
+chunk: 4/7
+source: "https://en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:59.111630+00:00"
+instance: "kb-cron"
+---
+
+=== Observing sunlight before or after seeing Sun ===
+It is possible to see sun-lit windows of nearby high-rise buildings from ground level a few minutes before seeing the sun rise or after seeing the sun set. On a non-curved, flat landmass it would only take seconds, due to minuscule ratio (compare ~45 meters / 150 feet of a 14-story building to intercontinental distances). If such a phenomenon were caused by a prismatic property of atmosphere in a flat world, with a relatively small source of light revolving around Earth (as in later, 1800's-dated, maps of Flat Earth), it would contradict with one's ability to see a proper panorama of starry sky at a time at night, rather than a small yet distorted, "stretched" patch of it.
+Likewise, the top of a mountain is illuminated before sunrise and after sunset, as are clouds.
+
+=== Watching the sun set twice ===
+On level ground, the difference in the distance to the horizon between lying down and standing up is large enough to watch the Sun set twice by quickly standing up immediately after seeing it set for the first time while lying down. This also can be done with an aerial work platform or with a fast elevator. On a flat Earth or a significantly large flat segment, it would not be possible to see the Sun again (unless standing near the edge closest to the Sun) due to a much faster-moving Sun shadow.
+
+=== Local solar time and time zones ===
+
+Ancient timekeeping reckoned "noon" as the time of day when the Sun is highest in the sky, with the rest of the hours in the day measured against that. During the day, the apparent solar time can be measured directly with a sundial. In ancient Egypt, the first known sundials divided the day into 12 hours, though because the length of the day changed with the season, the length of the hours also changed. Sundials that defined hours as always being the same duration appeared in the Renaissance. In Western Europe, clock towers and striking clocks were used in the Middle Ages to keep people nearby appraised of the local time, though compared to modern times this was less important in a largely agrarian society.
+Because the Sun reaches its highest point at different times for different longitudes (about four minutes of time for every degree of longitude difference east or west), the local solar noon in each city is different except for those directly north or south of each other. This means that the clocks in different cities could be offset from each other by minutes or hours. As clocks became more precise and industrialization made timekeeping more important, cities switched to mean solar time, which ignores minor variations in the timing of local solar noon over the year, due to the elliptical nature of Earth's orbit, and its tilt.
+The differences in clock time between cities was not generally a problem until the advent of railroad travel in the 1800s, which both made travel between distant cities much faster than by walking or horse, and also required passengers to show up at specific times to meet their desired trains. In the United Kingdom, railroads gradually switched to Greenwich Mean Time (set from local time at the Greenwich observatory in London), followed by public clocks across the country generally, forming a single time zone. In the United States, railroads published schedules based on local time, then later based on standard time for that railroad (typically the local time at the railroad's headquarters), and then finally based on four standard time zones shared across all railroads, where neighboring zones differed by exactly one hour. At first railroad time was synchronized by portable chronometers, and then later by telegraph and radio signals.
+San Francisco is at 122.41°W longitude and Richmond, Virginia, is at 77.46°W longitude. They are both at about 37.6°N latitude (±.2°). The approximately 45° of longitude difference translates into about 180 minutes, or 3 hours, of time between sunsets in the two cities, for example. San Francisco is in the Pacific Time zone, and Richmond is in the Eastern Time zone, which are three hours apart, so the local clocks in each city show that the Sun sets at about the same time when using the local time zone. But a phone call from Richmond to San Francisco at sunset will reveal that there are still three hours of daylight left in California.
+
+=== Determining the size of Earth by Eratosthenes ===
+
+Under the assumption that the Sun is very far away, the ancient Greek geographer Eratosthenes performed an experiment using the differences in the observed angle of the Sun from two different locations to calculate the circumference of Earth. Though modern telecommunications and timekeeping were not available, he was able to make sure the measurements happened at the same time by having them taken when the Sun was highest in the sky (local noon) at both locations. Using slightly inaccurate assumptions about the locations of two cities, he came to a result within 15% of the correct value. While his results could theoretically also be compatible with a Flat Earth if the light rays from the Sun are assumed not to be parallel, many people have repeated the experiment with three or more data points and found results unambiguously supporting the globe model.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-4.md b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-4.md
new file mode 100644
index 000000000..626147f06
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-4.md
@@ -0,0 +1,28 @@
+---
+title: "Empirical evidence for the spherical shape of Earth"
+chunk: 5/7
+source: "https://en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:59.111630+00:00"
+instance: "kb-cron"
+---
+
+=== Angle to the Sun at different locations ===
+On a given day, if many different cities measure the angle of the Sun at local noon, the resulting data, when combined with the known distances between cities, shows that Earth has 180 degrees of north-south curvature. (A full range of angles will be observed if the north and south poles are included, and the day chosen is either the autumnal or spring equinox.) This is consistent with many rounded shapes, including a sphere, and is inconsistent with a flat shape.
+Some claim that this experiment assumes a very distant Sun, such that the incoming rays are essentially parallel, and if a flat Earth is assumed, that the measured angles can allow one to calculate the distance to the Sun, which must be small enough that its incoming rays are not very parallel. However, if more than two relatively well-separated cities are included in the experiment, the calculation will make clear whether the Sun is distant or nearby. For example, on the equinox, the 0-degree angle from the North Pole and the 90-degree angle from the equator predict a Sun which would have to be located essentially next to the surface of a flat Earth, but the difference in angle between the equator and New York City would predict a Sun much further away if Earth is flat. Because these results are contradictory, the surface of Earth cannot be flat; the data are, instead, consistent with a nearly spherical Earth and a Sun which is very far away compared with the diameter of Earth.
+
+== Surface navigation ==
+The first circumnavigation of the Earth by the Magellan expedition lost a day, confirmed by subsequent circumnavigations, which eventually led to the creation of the International Date Line.
+The shortest way to travel between two distant points is by great circle navigation, as known by ocean navigators for some time. This route shows as curved on any map except for one using a gnomonic projection.  Radio waves also follow a great circle, so navies have produced maps using gnomonic projection for use in radio direction finding to locate enemy warships.
+Since the 1500s, many people have sailed or flown completely around Earth in all directions, and none have discovered an edge or impenetrable barrier. (See Arctic exploration and History of Antarctica.)
+Some flat Earth conjectures that propose that Earth is a north-pole-centered disk conceive of Antarctica as an impenetrable ice wall that encircles the planet and hides any edges. This disk model explains east-west circumnavigation as simply moving around the disk in a circle. (East-west paths form a circle in both disk and spherical geometry.) It is possible in this model to traverse the North Pole, but it would not be possible to perform a circumnavigation that includes the South Pole (which it posits does not exist).
+The Arctic Circle is roughly 16,000 km (9,900 mi) long, as is the Antarctic Circle. A "true circumnavigation" of Earth is defined, in order to account for the shape of Earth, to be about 2.5 times as long, including a crossing of the equator, at about 40,000 km (25,000 mi). On the flat Earth model, the ratios would require the Antarctic Circle to be 2.5 times the length of the circumnavigation, or 2.5 × 2.5 = 6.25 times the length of the Arctic Circle.
+Explorers, government researchers, commercial pilots, and tourists have been to Antarctica and found that it is not a large ring that encircles the entirety of Earth, but actually a roughly disk-shaped continent smaller than South America but larger than Australia, with an interior that can in fact be traversed in order to take a shorter path from, for example, the tip of South America to Australia than would be possible on a disk.
+The first land crossing of the entirety of Antarctica was the Commonwealth Trans-Antarctic Expedition in 1955–1958, and many exploratory airplanes have since passed over the continent in various directions. 
+
+== Grid distortion on a spherical surface ==
+A meridian of longitude is a line where local solar noon occurs at the same time each day. These lines define "north" and "south". These are perpendicular to lines of latitude that define "east" and "west", where the Sun is at the same angle at local noon on the same day. If the Sun were travelling from east to west over a flat Earth, meridian lines would always be the same distance apart – they would form a square grid when combined with lines of latitude. In reality, meridian lines get farther apart as one travels toward the equator, which is only possible on a round Earth. In places where land is plotted on a grid system, this causes discontinuities in the grid. For example, in areas of the Midwestern United States that use the Public Land Survey System, the northernmost and westernmost sections of a survey township deviate from what would otherwise be an exact square mile. The resulting discontinuities are sometimes reflected directly in local roads, which have kinks where the grid cannot follow completely straight lines. This distortion also affects how aerial photographs taken over large areas can be stitched together.
+The Mercator projection has examples of size distortions.
+
+=== Spherical versus flat triangles ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-5.md b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-5.md
new file mode 100644
index 000000000..c43ec0566
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-5.md
@@ -0,0 +1,31 @@
+---
+title: "Empirical evidence for the spherical shape of Earth"
+chunk: 6/7
+source: "https://en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:59.111630+00:00"
+instance: "kb-cron"
+---
+
+Because Earth is spherical, long-distance travel sometimes requires heading in different directions than one would head on a flat Earth. An example would be an airplane travelling 10,000 kilometres (6,200 mi) in a straight line, taking a 90-degree right turn, travelling another 10,000 kilometres (6,200 mi), taking another 90-degree right turn, and travelling 10,000 kilometres (6,200 mi) a third time. On a flat Earth, the aircraft would have travelled along three sides of a square, and arrive at a spot about 10,000 kilometres (6,200 mi) from where it started. But because Earth is spherical, in reality it will have travelled along three sides of a triangle, and arrive back very close to its starting point. If the starting point is the North Pole, it would have travelled due south from the North Pole to the equator, then west for a quarter of the way around Earth, and then due north back to the North Pole.
+In spherical geometry, the sum of angles inside a triangle is greater than 180° (in this example 270°, having arrived back at the north pole a 90° angle to the departure path) unlike on a flat surface, where it is always exactly 180°.
+
+== Weather systems ==
+Low-pressure weather systems with inward winds (such as a hurricane) spin counterclockwise north of the equator, but clockwise south of the equator. This is due to the Coriolis force, and requires that (assuming they are attached to each other and rotating in the same direction) the north and southern halves of Earth are angled in opposite directions (as in, the north is facing toward Polaris and the south is facing away from it).
+
+== Gravity ==
+The laws of gravity, chemistry, and physics that explain the formation and rounding of Earth are well-tested through experiment, and applied successfully to many engineering tasks.
+From these laws, the amount of mass Earth contains is known, as is the fact that a non-spherical planet the size of Earth would not be able to support itself against its own gravity. A disk the size of Earth, for example, would likely crack, heat up, liquefy, and re-form into a roughly spherical shape. On a disk strong enough to maintain its shape, gravity would not pull downward with respect to the surface, but would pull toward the center of the disk, contrary to what is observed on level terrain (and which would cause major problems with oceans flowing toward the center of the disk).
+Ignoring the other concerns, some flat Earth theorists explain the observed surface "gravity" by proposing that the flat Earth is constantly accelerating upwards. Such a conjecture would also leave open for explanation the tides seen in Earth's oceans, which are conventionally explained by the gravity exerted by the Sun and Moon. The Earth would also quickly approach light-speed in this scenario because the pull of gravity would increase by -9.8m/s, each second (as the formula for gravitational acceleration is measured in m/s2).
+
+== Modern technology ==
+Observations of Foucault pendulums, popular in science museums around the world, demonstrate both that the world is spherical and that it rotates (not that the stars are rotating around it).
+The mathematics of navigation using Global Positioning System (GPS) satellites assumes that they are moving in known orbits around an approximately spherical surface. The accuracy of GPS navigation in determining latitude and longitude and the way these numbers map onto locations on the ground show that these assumptions are correct. The same is true for the operational GLONASS system run by Russia, the in-development European Galileo, the Chinese BeiDou, and the Indian Regional Navigation Satellite System.
+Satellites, including communications satellites used for television, telephone, and Internet connections, would not stay in orbit unless the modern theory of gravitation were correct. The details of which satellites are visible from which places on the ground at which times prove an approximately spherical shape of Earth.
+Radio transmitters are mounted on tall towers because they generally rely on line-of-sight propagation. The distance to the horizon is further at higher altitude, so mounting them higher significantly increases the area they can serve. Some signals can be transmitted at much longer distances, but only if they are at frequencies where they can use groundwave propagation, tropospheric propagation, tropospheric scatter, or ionospheric propagation to reflect or refract signals around the curve of Earth.
+Equatorial mounts allow astronomers to point telescopes at the same celestial object for longer times while compensating for Earth's rotation in an easy way. The axis of an equatorial mount is parallel to Earth's surface when observing stars at Earth's equator – but perpendicular to it when observing from one of Earth's poles. Equatorial mounts were specifically developed for a spherical and rotating Earth. If Earth were flat, an equatorial mount would not make sense.
+Footage of shadows from live webcams can be combined with location and orientation data to locate the sun.
+
+=== Building engineering ===
+The design of some large structures needs to take the shape of Earth into account. For example, the towers of the Humber Bridge, although both vertical with respect to gravity, are 36 mm (1.4 inches) farther apart at the top than the bottom due to Earth's curvature.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-6.md b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-6.md
new file mode 100644
index 000000000..ba745a8db
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth-6.md
@@ -0,0 +1,32 @@
+---
+title: "Empirical evidence for the spherical shape of Earth"
+chunk: 7/7
+source: "https://en.wikipedia.org/wiki/Empirical_evidence_for_the_spherical_shape_of_Earth"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:59.111630+00:00"
+instance: "kb-cron"
+---
+
+=== Aircraft and spacecraft ===
+People in high-flying aircraft or skydiving from high-altitude balloons can plainly see the curvature of Earth. Low-flying planes and commercial airliners do not necessarily fly high enough to make this obvious, especially when passenger windows narrow the field of view or clouds or terrain reduce the effective height from the visible surface. Trying to measure the curvature of the horizon by taking a picture is complicated by the fact that both windows and camera lenses can produce distorted images depending on the angle used. An extreme version of this effect can be seen in the fisheye lens. Scientific measurements would require a carefully calibrated lens.
+Photos of the ground taken from airplanes over a large enough area also do not fit seamlessly together on a flat surface, but do fit on a roughly spherical surface. Aerial photographs of large areas must be corrected to account for curvature.
+Many pictures have been taken of the entirety of Earth by satellites launched by a variety of governments and private organizations. From high orbits, where half the planet can be seen at once, it is plainly spherical. The only way to piece together all the pictures taken of the ground from lower orbits so that all the surface features line up seamlessly and without distortion is to put them on an approximately spherical surface.
+Astronauts in low Earth orbit can personally see the curvature of the planet, and travel all the way around several times a day. The astronauts who travelled to the Moon have seen the entire Moon-facing half at once, and can watch the sphere rotate once a day (approximately; the Moon is also moving with respect to Earth).
+When the supersonic aircraft Concorde took off not long after sunset from London and flew westward to New York, it outran the Sun's apparent motion westward – and therefore passengers aboard observed the Sun rising in the west as they travelled. After landing in New York, passengers watched a second sunset in the west.
+
+Because the speed of the Sun's shadow is slower in polar regions (due to the steeper angle), even a subsonic aircraft can overtake the sunset when flying at high latitudes. One photographer used a roughly circular route around the North Pole to take pictures of 24 sunsets in the same 24-hour period, pausing westward progress in each time zone to let the shadow of the Sun catch up. The surface of Earth rotates at 180.17 miles per hour (289.96 km/h) at 80° north or south, and 1,040.4 miles per hour (1,674.4 km/h) at the equator.
+
+=== Ring-laser gyroscope ===
+In the documentary Behind the Curve, Bob Knodel uses a ring-laser gyroscope to attempt to prove that the earth does not rotate. The results instead showed a 15 degree per hour drift, due to the earth's rotation.
+
+== See also ==
+
+Earth ellipsoid
+Geodesy
+Spherical Earth
+Timeline of Earth estimates
+
+== References ==
+
+== External links ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-0.md b/data/en.wikipedia.org/wiki/Evidence-0.md
new file mode 100644
index 000000000..696f70e41
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-0.md
@@ -0,0 +1,31 @@
+---
+title: "Evidence"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:48.385077+00:00"
+instance: "kb-cron"
+---
+
+Evidence for a proposition is what supports the proposition. It is usually understood as an indication that the proposition is true. The exact definition and role of evidence vary across different fields.
+In epistemology, evidence is what justifies beliefs or what makes it rational to hold a certain doxastic attitude. For example, a perceptual experience of a tree may serve as evidence to justify the belief that there is a tree. In this role, evidence is usually understood as a private mental state. In phenomenology, evidence is limited to intuitive knowledge, often associated with the controversial assumption that it provides indubitable access to truth.
+In science, scientific evidence is information gained through the scientific method that confirms or disconfirms scientific hypotheses, acting as a neutral arbiter between competing theories. Measurements of Mercury's "anomalous" orbit, for example, are seen as evidence that confirms Einstein's theory of general relativity. The problems of underdetermination and theory-ladenness are two obstacles that threaten to undermine the role of scientific evidence. Philosophers of science tend to understand evidence not as mental states but as verifiable information, observable physical objects or events, secured by following the scientific method.
+In law, evidence is information to establish or refute claims relevant to a case, such as testimony, documentary evidence, and physical evidence.
+The relation between evidence and a supported statement can vary in strength, ranging from weak correlation to indisputable proof. Theories of the evidential relation examine the nature of this connection. Probabilistic approaches hold that something counts as evidence if it increases the probability of the supported statement. According to hypothetico-deductivism, evidence consists in observational consequences of a hypothesis. The positive-instance approach states that an observation sentence is evidence for a universal statement if the sentence describes a positive instance of this statement.
+
+== Philosophy of evidence ==
+
+=== Characteristics ===
+Understood in its broadest sense, evidence for a proposition is what supports this proposition. Traditionally, the term is sometimes understood in a narrower sense: as the intuitive knowledge of facts that are considered indubitable. In this sense, only the singular form is used. This meaning is found especially in phenomenology, in which evidence is elevated to one of the basic principles of philosophy, giving philosophy the ultimate justifications that are supposed to turn it into a rigorous science. In a more modern usage, the plural form is also used. In academic discourse, evidence plays a central role in epistemology and in the philosophy of science. Reference to evidence is made in many different fields, like in science, in the legal system, in history, in journalism and in everyday discourse. A variety of different attempts have been made to conceptualize the nature of evidence. These attempts often proceed by starting with intuitions from one field or in relation to one theoretical role played by evidence and go on to generalize these intuitions, leading to a universal definition of evidence.
+One important intuition is that evidence is what justifies beliefs. This line of thought is usually followed in epistemology and tends to explain evidence in terms of private mental states, for example, as experiences, other beliefs or knowledge. This is closely related to the idea that how rational someone is, is determined by how they respond to evidence. Another intuition, which is more dominant in the philosophy of science, focuses on evidence as that which confirms scientific hypotheses and arbitrates between competing theories. On this view, it is essential that evidence is public so that different scientists can share the same evidence. This leaves publicly observable phenomena like physical objects and events as the best candidates for evidence, unlike private mental states. One problem with these approaches is that the resulting definitions of evidence, both within a field and between fields, vary a lot and are incompatible with each other. For example, it is not clear what a bloody knife and a perceptual experience have in common when both are treated as evidence in different disciplines. This suggests that there is no unitary concept corresponding to the different theoretical roles ascribed to evidence, i.e. that we do not always mean the same thing when we talk of evidence.
+On the other hand, Aristotle, phenomenologists, and numerous scholars accept that there could be several degrees of evidence. For instance, while the outcome of a complex equation may become more or less evident to a mathematician after hours of deduction, yet with little doubts about it, a simpler formula would appear more evident to them.
+Riofrio has detected some characteristics that are present in evident arguments and proofs. The more they are evident, the more these characteristics will be present. There are six intrinsic characteristics of evidence:
+
+The truth lies in what is evident, while falsehood or irrationality, although it may appear evident at times, lacks true evidence.
+What is evident aligns coherently with other truths acquired through knowledge. Any insurmountable incoherence would indicate the presence of error or falsehood.
+Evident truths are based on necessary reasoning.
+The simplest truths are the most evident. They are self-explanatory and do not require argumentation to be understood by the intellect. However, for those lacking education, certain complex truths require rational discourse to become evident.
+Evident truths do not need justification; they are indubitable. They are intuitively grasped by the intellect, without the need for further discourse, arguments, or proof.
+Evident truths are clear, translucent, and filled with light.
+In addition, four subjective or external characteristics can be detected over those things that are more or less evident:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-1.md b/data/en.wikipedia.org/wiki/Evidence-1.md
new file mode 100644
index 000000000..66e9806d9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-1.md
@@ -0,0 +1,120 @@
+---
+title: "Evidence"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:48.385077+00:00"
+instance: "kb-cron"
+---
+
+The evident instills certainty and grants the knower a subjective sense of security, as they believe to have aligned with the truth
+Initially, evident truths are perceived as natural and effortless, as Aristotle highlighted. They are innately present within the intellect, fostering a peaceful and harmonious understanding.
+Consequently, evident truths appear to be widely shared, strongly connected to common sense, which comprises generally accepted beliefs.
+Evident truths are fertile ground: they provide a solid foundation for other branches of scientific knowledge to flourish.
+These ten characteristics of what is evident allowed Riofrio to formulate a test of evidence to detect the level of certainty or evidence that one argument or proof could have.
+
+=== Evidential relation ===
+Philosophers in the 20th century started to investigate the "evidential relation", the relation between evidence and the proposition supported by it.  The issue of the nature of the evidential relation concerns the question of what this relation has to be like in order for one thing to justify a belief or to confirm a hypothesis. Important theories in this field include the probabilistic approach, hypothetico-deductivism and the positive-instance approach.
+Probabilistic approaches, also referred to as Bayesian confirmation theory, explain the evidential relation in terms of probabilities. They hold that all that is necessary is that the existence of the evidence increases the likelihood that the hypothesis is true. This can be expressed mathematically as 
+  
+    
+      
+        P
+        (
+        H
+        ∣
+        E
+        )
+        >
+        P
+        (
+        H
+        )
+      
+    
+    {\displaystyle P(H\mid E)>P(H)}
+  
+. In words: a piece of evidence (E) confirms a hypothesis (H) if the conditional probability of this hypothesis relative to the evidence is higher than the unconditional probability of the hypothesis by itself. Smoke (E), for example, is evidence that there is a fire (H), because the two usually occur together, which is why the likelihood of fire given that there is smoke is higher than the likelihood of fire by itself. On this view, evidence is akin to an indicator or a symptom of the truth of the hypothesis. Against this approach, it has been argued that it is too liberal because it allows accidental generalizations as evidence. Finding a nickel in one's pocket, for example, raises the probability of the hypothesis that "All the coins in my pockets are nickels". But, according to Alvin Goldman, it should not be considered evidence for this hypothesis since there is no lawful connection between this one nickel and the other coins in the pocket.
+Hypothetico-deductivism is a non-probabilistic approach that characterizes the evidential relations in terms of deductive consequences of the hypothesis. According to this view, "evidence for a hypothesis is a true observational consequence of that hypothesis". One problem with the characterization so far is that hypotheses usually contain relatively little information and therefore have few if any deductive observational consequences. So the hypothesis by itself that there is a fire does not entail that smoke is observed. Instead, various auxiliary assumptions have to be included about the location of the smoke, the fire, the observer, the lighting conditions, the laws of chemistry, etc. In this way, the evidential relation becomes a three-place relation between evidence, hypothesis and auxiliary assumptions. This means that whether a thing is evidence for a hypothesis depends on the auxiliary assumptions one holds. This approach fits well with various scientific practices. For example, it is often the case that experimental scientists try to find evidence that would confirm or disconfirm a proposed theory. The hypothetico-deductive approach can be used to predict what should be observed in an experiment if the theory was true. It thereby explains the evidential relation between the experiment and the theory. One problem with this approach is that it cannot distinguish between relevant and certain irrelevant cases. So if smoke is evidence for the hypothesis "there is fire", then it is also evidence for conjunctions including this hypothesis, for example, "there is fire and Socrates was wise", despite the fact that Socrates's wisdom is irrelevant here.
+According to the positive-instance approach, an observation sentence is evidence for a universal hypothesis if the sentence describes a positive instance of this hypothesis. For example, the observation that "this swan is white" is an instance of the universal hypothesis that "all swans are white". This approach can be given a precise formulation in first-order logic: a proposition is evidence for a hypothesis if it entails the "development of the hypothesis". Intuitively, the development of the hypothesis is what the hypothesis states if it was restricted to only the individuals mentioned in the evidence. In the case above, we have the hypothesis "
+  
+    
+      
+        ∀
+        x
+        (
+        s
+        w
+        a
+        n
+        (
+        x
+        )
+        →
+        w
+        h
+        i
+        t
+        e
+        (
+        x
+        )
+        )
+      
+    
+    {\displaystyle \forall x(swan(x)\rightarrow white(x))}
+  
+" (all swans are white) which, when restricted to the domain "{a}", containing only the one individual mentioned in the evidence, entails the evidence, i.e. "
+  
+    
+      
+        s
+        w
+        a
+        n
+        (
+        a
+        )
+        ∧
+        w
+        h
+        i
+        t
+        e
+        (
+        a
+        )
+      
+    
+    {\displaystyle swan(a)\land white(a)}
+  
+" (this swan is white). One important shortcoming of this approach is that it requires that the hypothesis and the evidence are formulated in the same vocabulary, i.e. use the same predicates, like "
+  
+    
+      
+        s
+        w
+        a
+        n
+      
+    
+    {\displaystyle swan}
+  
+" or "
+  
+    
+      
+        w
+        h
+        i
+        t
+        e
+      
+    
+    {\displaystyle white}
+  
+" above. But many scientific theories posit theoretical objects, like electrons or strings in physics, that are not directly observable and therefore cannot show up in the evidence as conceived here.
+
+=== In specific fields ===
+Important theorists of evidence include Bertrand Russell, Willard Van Orman Quine, the logical positivists, Timothy Williamson, Earl Conee and Richard Feldman. Russell, Quine and the logical positivists belong to the empiricist tradition and hold that evidence consists in sense data, stimulation of one's sensory receptors and observation statements, respectively. According to Williamson, all and only knowledge constitute evidence. Conee and Feldman hold that only one's current mental states should be considered evidence.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-2.md b/data/en.wikipedia.org/wiki/Evidence-2.md
new file mode 100644
index 000000000..e3e891587
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-2.md
@@ -0,0 +1,17 @@
+---
+title: "Evidence"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:48.385077+00:00"
+instance: "kb-cron"
+---
+
+==== In epistemology ====
+The guiding intuition within epistemology concerning the role of evidence is that it is what justifies beliefs. For example, Phoebe's auditory experience of the music justifies her belief that the speakers are on. Evidence has to be possessed by the believer in order to play this role. So Phoebe's own experiences can justify her own beliefs but not someone else's beliefs. Some philosophers hold that evidence possession is restricted to conscious mental states, for example, to sense data. This view has the implausible consequence that many of simple everyday-beliefs would be unjustified. The more common view is that all kinds of mental states, including stored beliefs that are currently unconscious, can act as evidence. It is sometimes argued that the possession of a mental state capable of justifying another is not sufficient for the justification to happen. The idea behind this line of thought is that justified belief has to be connected to or grounded in the mental state acting as its evidence. So Phoebe's belief that the speakers are on is not justified by her auditory experience if the belief is not based in this experience. This would be the case, for example, if Phoebe has both the experience and the belief but is unaware of the fact that the music is produced by the speakers.
+It is sometimes held that only propositional mental states can play this role, a position known as "propositionalism". A mental state is propositional if it is an attitude directed at a propositional content. Such attitudes are usually expressed by verbs like "believe" together with a that-clause, as in "Robert believes that the corner shop sells milk". Such a view denies that sensory impressions can act as evidence. This is often held as an argument against this view since sensory impressions are commonly treated as evidence. Propositionalism is sometimes combined with the view that only attitudes to true propositions can count as evidence. On this view, the belief that the corner shop sells milk only constitutes evidence for the belief that the corner shop sells dairy products if the corner shop actually sells milk. Against this position, it has been argued that evidence can be misleading but still count as evidence.
+This line of thought is often combined with the idea that evidence, propositional or otherwise, determines what it is rational for us to believe. But it can be rational to have a false belief. This is the case when we possess misleading evidence. For example, it was rational for Neo in the Matrix movie to believe that he was living in the 20th century because of all the evidence supporting his belief despite the fact that this evidence was misleading since it was part of a simulated reality. This account of evidence and rationality can also be extended to other doxastic attitudes, like disbelief and suspension of belief. So rationality does not just demand that we believe something if we have decisive evidence for it, it also demands that we disbelieve something if we have decisive evidence against it and that we suspend belief if we lack decisive evidence either way.
+
+==== In phenomenology ====
+The meaning of the term "evidence" in phenomenology shows many parallels to its epistemological usage, but it is understood in a narrower sense. Thus, evidence here specifically refers to intuitive knowledge, which is described as "self-given" (selbst-gegeben). This contrasts with empty intentions, in which one refers to states of affairs through a certain opinion, but without an intuitive presentation. This is why evidence is often associated with the controversial thesis that it constitutes an immediate access to truth. In this sense, the evidently given phenomenon guarantees its own truth and is therefore considered indubitable. Due to this special epistemological status of evidence, it is regarded in phenomenology as the basic principle of all philosophy. In this form, it represents the lowest foundation of knowledge, which consists of indubitable insights upon which all subsequent knowledge is built. This evidence-based method is meant to make it possible for philosophy to overcome many of the traditionally unresolved disagreements and thus become a rigorous science. This far-reaching claim of phenomenology, based on absolute certainty, is one of the focal points of criticism by its opponents. Thus, it has been argued that even knowledge based on self-evident intuition is fallible. This can be seen, for example, in the fact that even among phenomenologists, there is much disagreement about the basic structures of experience.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-3.md b/data/en.wikipedia.org/wiki/Evidence-3.md
new file mode 100644
index 000000000..cf8d8f340
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-3.md
@@ -0,0 +1,25 @@
+---
+title: "Evidence"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:48.385077+00:00"
+instance: "kb-cron"
+---
+
+==== In philosophy of science ====
+In the sciences, evidence is understood as what confirms or disconfirms scientific hypotheses. The term "confirmation" is sometimes used synonymously with that of "evidential support". Measurements of Mercury's "anomalous" orbit, for example, are seen as evidence that confirms Einstein's theory of general relativity. This is especially relevant for choosing between competing theories. So in the case above, evidence plays the role of neutral arbiter between Newton's and Einstein's theory of gravitation. This is only possible if scientific evidence is public and uncontroversial so that proponents of competing scientific theories agree on what evidence is available. These requirements suggest scientific evidence consists not of private mental states but of public physical objects or events.
+It is often held that evidence is in some sense prior to the hypotheses it confirms. This was sometimes understood as temporal priority, i.e. that we come first to possess the evidence and later form the hypothesis through induction. But this temporal order is not always reflected in scientific practice, where experimental researchers may look for a specific piece of evidence in order to confirm or disconfirm a pre-existing hypothesis. Logical positivists, on the other hand, held that this priority is semantic in nature, i.e. that the meanings of the theoretical terms used in the hypothesis are determined by what would count as evidence for them. Counterexamples for this view come from the fact that our idea of what counts as evidence may change while the meanings of the corresponding theoretical terms remain constant. The most plausible view is that this priority is epistemic in nature, i.e. that our belief in a hypothesis is justified based on the evidence while the justification for the belief in the evidence does not depend on the hypothesis.
+A central issue for the scientific conception of evidence is the problem of underdetermination, i.e. that the evidence available supports competing theories equally well. So, for example, evidence from our everyday life about how gravity works confirms Newton's and Einstein's theory of gravitation equally well and is therefore unable to establish consensus among scientists. But in such cases, it is often the gradual accumulation of evidence that eventually leads to an emerging consensus. This evidence-driven process towards consensus seems to be one hallmark of the sciences not shared by other fields.
+Another problem for the conception of evidence in terms of confirmation of hypotheses is that what some scientists consider the evidence to be may already involve various theoretical assumptions not shared by other scientists. This phenomenon is known as theory-ladenness. Some cases of theory-ladenness are relatively uncontroversial, for example, that the numbers output by a measurement device need additional assumptions about how this device works and what was measured in order to count as meaningful evidence. Other putative cases are more controversial, for example, the idea that different people or cultures perceive the world through different, incommensurable conceptual schemes, leading them to very different impressions about what is the case and what evidence is available. Theory-ladenness threatens to impede the role of evidence as neutral arbiter since these additional assumptions may favor some theories over others. It could thereby also undermine a consensus to emerge since the different parties may be unable to agree even on what the evidence is. When understood in the widest sense, it is not controversial that some form of theory-ladenness exists. But it is questionable whether it constitutes a serious threat to scientific evidence when understood in this sense.
+
+== Different types of evidence ==
+
+=== In science (empirical evidence) ===
+
+In scientific research evidence is accumulated through observations of phenomena that occur in the natural world, or which are created as experiments in a laboratory or other controlled conditions. Scientists tend to focus on how the data used during statistical inference are generated. Scientific evidence usually goes towards supporting or rejecting a hypothesis.
+The burden of proof is on the person making a contentious claim.  Within science, this translates to the burden resting on presenters of a paper, in which the presenters argue for their specific findings. This paper is placed before a panel of judges where the presenter must defend the thesis against all challenges.
+When evidence is contradictory to predicted expectations, the evidence and the ways of making it are often closely scrutinized (see experimenter's regress) and only at the end of this process is the hypothesis rejected: this can be referred to as 'refutation of the hypothesis'.  The rules for evidence used by science are collected systematically in an attempt to avoid the bias inherent to anecdotal evidence.
+
+=== In law ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-4.md b/data/en.wikipedia.org/wiki/Evidence-4.md
new file mode 100644
index 000000000..aedd80706
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-4.md
@@ -0,0 +1,51 @@
+---
+title: "Evidence"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:48.385077+00:00"
+instance: "kb-cron"
+---
+
+In law, the production and presentation of evidence depend first on establishing on whom the burden of proof lies. Admissible evidence is that which a court receives and considers for the purposes of deciding a particular case. Two primary burden-of-proof considerations exist in law.  The first is on whom the burden rests.  In many, especially Western, courts, the burden of proof is placed on the prosecution in criminal cases and the plaintiff in civil cases.  The second consideration is the degree of certitude proof must reach, depending on both the quantity and quality of evidence.  These degrees are different for criminal and civil cases, the former requiring evidence beyond a reasonable doubt, the latter considering only which side has the preponderance of evidence, or whether the proposition is more likely true or false.
+The parts of a legal case that are not in controversy are known, in general, as the "facts of the case." Beyond any facts that are undisputed, a judge or jury is usually tasked with being a trier of fact for the other issues of a case. Evidence and rules are used to decide questions of fact that are disputed, some of which may be determined by the legal burden of proof relevant to the case. Evidence in certain cases (e.g. capital crimes) must be more compelling than in other situations (e.g. minor civil disputes), which drastically affects the quality and quantity of evidence necessary to decide a case. The decision-maker, often a jury, but sometimes a judge decides whether the burden of proof has been fulfilled. After deciding who will carry the burden of proof, the evidence is first gathered and then presented before the court:
+
+==== Collection ====
+
+In a criminal investigation, rather than attempting to prove an abstract or hypothetical point, the evidence gatherers attempt to determine who is responsible for a criminal act.  The focus of criminal evidence is to connect physical evidence and reports of witnesses to a specific person.
+
+==== Presentation ====
+The path that physical evidence takes from the scene of a crime or the arrest of a suspect to the courtroom is called the chain of custody. In a criminal case, this path must be clearly documented or attested to by those who handled the evidence. If the chain of evidence is broken, a defendant may be able to persuade the judge to declare the evidence inadmissible.
+Presenting evidence before the court differs from the gathering of evidence in important ways. Gathering evidence may take many forms; presenting evidence that tends to prove or disprove the point at issue is strictly governed by rules. Failure to follow these rules leads to any number of consequences. In law, certain policies allow (or require) evidence to be excluded from consideration based either on indicia relating to reliability, or broader social concerns. Testimony (which tells) and exhibits (which show) are the two main categories of evidence presented at a trial or hearing. In the United States, evidence in federal court is admitted or excluded under the Federal Rules of Evidence.
+
+==== Burden of proof ====
+
+The burden of proof is the obligation of a party in an argument or dispute to provide sufficient evidence to shift the other party's or a third party's belief from their initial position. The burden of proof must be fulfilled by both establishing confirming evidence and negating oppositional evidence. Conclusions drawn from evidence may be subject to criticism based on a perceived failure to fulfill the burden of proof.
+Two principal considerations are:
+
+On whom does the burden of proof rest?
+To what degree of certitude must the assertion be supported?
+The latter question depends on the nature of the point under contention and determines the quantity and quality of evidence required to meet the burden of proof.
+In a criminal trial in the United States, for example, the prosecution carries the burden of proof since the defendant is presumed innocent until proven guilty beyond a reasonable doubt. Similarly, in most civil procedures, the plaintiff carries the burden of proof and must convince a judge or jury that the preponderance of the evidence is on their side. Other legal standards of proof include "reasonable suspicion", "probable cause" (as for arrest), "prima facie evidence", "credible evidence", "substantial evidence", and "clear and convincing evidence".
+In a philosophical debate, there is an implicit burden of proof on the party asserting a claim, since the default position is generally one of neutrality or unbelief. Each party in a debate will therefore carry the burden of proof for any assertion they make in the argument, although some assertions may be granted by the other party without further evidence. If the debate is set up as a resolution to be supported by one side and refuted by another, the overall burden of proof is on the side supporting the resolution.
+
+==== Specific types ====
+Digital evidence
+Physical evidence
+Relationship evidence
+Testimonial evidence
+Trace evidence
+
+== See also ==
+
+== References ==
+
+== External links ==
+
+Evidence at PhilPapers
+Zalta, Edward N. (ed.). "Evidence". Stanford Encyclopedia of Philosophy. ISSN 1095-5054. OCLC 429049174.
+Fieser, James; Dowden, Bradley (eds.). "Evidence". Internet Encyclopedia of Philosophy. ISSN 2161-0002. OCLC 37741658.
+Evidence at the Indiana Philosophy Ontology Project
+ASTM E141 Standard Practice for Acceptance of Evidence Based on the Results of Probability Sampling
+"Evidence" . Encyclopædia Britannica (11th ed.). 1911.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-based_medicine-0.md b/data/en.wikipedia.org/wiki/Evidence-based_medicine-0.md
new file mode 100644
index 000000000..509e2bc2f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-based_medicine-0.md
@@ -0,0 +1,30 @@
+---
+title: "Evidence-based medicine"
+chunk: 1/6
+source: "https://en.wikipedia.org/wiki/Evidence-based_medicine"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:04.124595+00:00"
+instance: "kb-cron"
+---
+
+Evidence-based medicine (EBM), sometimes known within healthcare as evidence-based practice (EBP), is "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. It means integrating individual clinical expertise with the best available external clinical evidence from systematic research." The aim of EBM is to integrate the experience of the clinician, the values of the patient, and the best available scientific information to guide decision-making about clinical management.  The term was originally used to describe an approach to teaching the practice of medicine and improving decisions by individual physicians about individual patients.
+The EBM Pyramid is a tool that helps in visualizing the hierarchy of evidence in medicine, from least authoritative, like expert opinions, to most authoritative, like systematic reviews.
+Adoption of evidence-based medicine is necessary in a human rights-based approach to public health and a precondition for accessing the right to health.
+
+== Background, history, and definition ==
+Medicine has a long history of scientific inquiry into the prevention, diagnosis, and treatment of human disease. In the 11th century AD, Avicenna, a Persian physician and philosopher, developed an approach to EBM that was mostly similar to current ideas and practises.
+The concept of a controlled clinical trial was first described in 1662 by Jan Baptist van Helmont in reference to the practice of bloodletting. Wrote Van Helmont:
+
+Let us take out of the Hospitals, out of the Camps, or from elsewhere, 200, or 500 poor People, that have fevers or Pleuritis. Let us divide them in Halfes, let us cast lots, that one halfe of them may fall to my share, and the others to yours; I will cure them without blood-letting and sensible evacuation; but you do, as ye know ... we shall see how many Funerals both of us shall have...
+The first published report describing the conduct and results of a controlled clinical trial was by James Lind, a Scottish naval surgeon who conducted research on scurvy during his time aboard HMS Salisbury in the Channel Fleet, while patrolling the Bay of Biscay. Lind divided the sailors participating in his experiment into six groups, so that the effects of various treatments could be fairly compared. Lind found improvement in symptoms and signs of scurvy among the group of men treated with lemons or oranges. He published a treatise describing the results of this experiment in 1753.
+An early critique of statistical methods in medicine was published in 1835, in Comtes Rendus de l'Académie des Sciences, Paris, by a man referred to as "Mr Civiale".
+In 1990, Gordon Guyatt, then a young internal medicine residency coordinator at McMaster University, introduced a teaching method he initially termed "Scientific Medicine." This approach emphasized applying critical appraisal techniques directly to bedside clinical decision-making, building on the work of his mentor, David Sackett. However, the concept met resistance from colleagues, as it implied that existing clinical practices lacked scientific rigor, even though this was likely true. To address this, Guyatt rebranded the approach as "Evidence-Based Medicine", a term first formally introduced in a 1991 editorial in the ACP Journal Club. Although the name was coined in 1991, it took several years after and a concerted efforts of many other teams to define the foundations of this method.
+Although more popular in medicine, the concept of "evidence-based" is spreading to other disciplines, such as the humanities, and to languages other than English, albeit at a slower pace.
+
+=== Clinical decision-making ===
+Alvan Feinstein's publication of Clinical Judgment in 1967 focused attention on the role of clinical reasoning and identified biases that can affect it. In 1972, Archie Cochrane published Effectiveness and Efficiency, which described the lack of controlled trials supporting many practices that had previously been assumed to be effective. In 1973, John Wennberg began to document wide variations in how physicians practiced. Through the 1980s, David M. Eddy described errors in clinical reasoning and gaps in evidence. In the mid-1980s, Alvin Feinstein, David Sackett and others published textbooks on clinical epidemiology, which translated epidemiological methods to physician decision-making. Toward the end of the 1980s, a group at RAND showed that large proportions of procedures performed by physicians were considered inappropriate even by the standards of their own experts.
+
+=== Evidence-based guidelines and policies ===
+
+David M. Eddy first began to use the term 'evidence-based' in 1987 in workshops and a manual commissioned by the Council of Medical Specialty Societies to teach formal methods for designing clinical practice guidelines. The manual was eventually published by the American College of Physicians. Eddy first published the term 'evidence-based' in March 1990, in an article in the Journal of the American Medical Association (JAMA) that laid out the principles of evidence-based guidelines and population-level policies, which Eddy described as "explicitly describing the available evidence that pertains to a policy and tying the policy to evidence instead of standard-of-care practices or the beliefs of experts. The pertinent evidence must be identified, described, and analyzed. The policymakers must determine whether the policy is justified by the evidence. A rationale must be written." He discussed evidence-based policies in several other papers published in JAMA in the spring of 1990. Those papers were part of a series of 28 published in JAMA between 1990 and 1997 on formal methods for designing population-level guidelines and policies.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-based_medicine-1.md b/data/en.wikipedia.org/wiki/Evidence-based_medicine-1.md
new file mode 100644
index 000000000..e4979e2e3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-based_medicine-1.md
@@ -0,0 +1,23 @@
+---
+title: "Evidence-based medicine"
+chunk: 2/6
+source: "https://en.wikipedia.org/wiki/Evidence-based_medicine"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:04.124595+00:00"
+instance: "kb-cron"
+---
+
+=== Medical education ===
+The term 'evidence-based medicine' was introduced slightly later, in the context of medical education. In the autumn of 1990, Gordon Guyatt used it in an unpublished description of a program at McMaster University for prospective or new medical students. Guyatt and others first published the term two years later (1992) to describe a new approach to teaching the practice of medicine.
+In 1996, David Sackett and colleagues clarified the definition of this tributary of evidence-based medicine as "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. ... [It] means integrating individual clinical expertise with the best available external clinical evidence from systematic research." This branch of evidence-based medicine aims to make individual decision making more structured and objective by better reflecting the evidence from research. Population-based data are applied to the care of an individual patient, while respecting the fact that practitioners have clinical expertise reflected in effective and efficient diagnosis and thoughtful identification and compassionate use of individual patients' predicaments, rights, and preferences.
+Between 1993 and 2000, the Evidence-Based Medicine Working Group at McMaster University published the methods to a broad physician audience in a series of 25 "Users' Guides to the Medical Literature" in JAMA. In 1995 Rosenberg and Donald defined individual-level, evidence-based medicine as "the process of finding, appraising, and using contemporaneous research findings as the basis for medical decisions." In 2010, Greenhalgh used a definition that emphasized quantitative methods: "the use of mathematical estimates of the risk of benefit and harm, derived from high-quality research on population samples, to inform clinical decision-making in the diagnosis, investigation or management of individual patients."
+The two original definitions highlight important differences in how evidence-based medicine is applied to populations versus individuals. When designing guidelines applied to large groups of people in settings with relatively little opportunity for modification by individual physicians, evidence-based policymaking emphasizes that good evidence should exist to document a test's or treatment's effectiveness. In the setting of individual decision-making, practitioners can be given greater latitude in how they interpret research and combine it with their clinical judgment. In 2005, Eddy offered an umbrella definition for the two branches of EBM: "Evidence-based medicine is a set of principles and methods intended to ensure that to the greatest extent possible, medical decisions, guidelines, and other types of policies are based on and consistent with good evidence of effectiveness and benefit."
+
+=== Progress ===
+In the area of evidence-based guidelines and policies, the explicit insistence on evidence of effectiveness was introduced by the American Cancer Society in 1980. The U.S. Preventive Services Task Force (USPSTF) began issuing guidelines for preventive interventions based on evidence-based principles in 1984. In 1985, the Blue Cross Blue Shield Association applied strict evidence-based criteria for covering new technologies. Beginning in 1987, specialty societies such as the American College of Physicians, and voluntary health organizations such as the American Heart Association, wrote many evidence-based guidelines. In 1991, Kaiser Permanente, a managed care organization in the US, began an evidence-based guidelines program. In 1991, Richard Smith wrote an editorial in the British Medical Journal and introduced the ideas of evidence-based policies in the UK. In 1993, the Cochrane Collaboration created a network of 13 countries to produce systematic reviews and guidelines. In 1997, the US Agency for Healthcare Research and Quality (AHRQ, then known as the Agency for Health Care Policy and Research, or AHCPR) established Evidence-based Practice Centers (EPCs) to produce evidence reports and technology assessments to support the development of guidelines. In the same year, a National Guideline Clearinghouse that followed the principles of evidence-based policies was created by AHRQ, the AMA, and the American Association of Health Plans (now America's Health Insurance Plans). In 1999, the National Institute for Clinical Excellence (NICE) was created in the UK to circulate evidence and guidance on treatments within the NHS.
+In the area of medical education, medical schools in Canada, the US, the UK, Australia, and other countries now offer programs that teach evidence-based medicine. A 2009 study of UK programs found that more than half of UK medical schools offered some training in evidence-based medicine, although the methods and content varied considerably, and EBM teaching was restricted by lack of curriculum time, trained tutors and teaching materials. Many programs have been developed to help individual physicians gain better access to evidence. For example, UpToDate was created in the early 1990s. The Cochrane Collaboration began publishing evidence reviews in 1993. In 1995, BMJ Publishing Group launched Clinical Evidence, a 6-monthly periodical that provided brief summaries of the current state of evidence about important clinical questions for clinicians.
+
+=== Current practice ===
+By 2000, use of the term evidence-based had extended to other levels of the health care system. An example is evidence-based health services, which seek to increase the competence of health service decision makers and the practice of evidence-based medicine at the organizational or institutional level.
+The multiple tributaries of evidence-based medicine share an emphasis on the importance of incorporating evidence from formal research in medical policies and decisions. However, because they differ on the extent to which they require good evidence of effectiveness before promoting a guideline or payment policy, a distinction is sometimes made between evidence-based medicine and science-based medicine, which also takes into account factors such as prior plausibility and compatibility with established science (as when medical organizations promote controversial treatments such as acupuncture).  Differences also exist regarding the extent to which it is feasible to incorporate individual-level information in decisions. Thus, evidence-based guidelines and policies may not readily "hybridise" with experience-based practices orientated towards ethical clinical judgement, and can lead to contradictions, contest, and unintended crises. The most effective "knowledge leaders" (managers and clinical leaders) use a broad range of management knowledge in their decision making, rather than just formal evidence. Evidence-based guidelines may provide the basis for governmentality in health care, and consequently play a central role in the governance of contemporary health care systems.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-based_medicine-2.md b/data/en.wikipedia.org/wiki/Evidence-based_medicine-2.md
new file mode 100644
index 000000000..6c3033391
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-based_medicine-2.md
@@ -0,0 +1,35 @@
+---
+title: "Evidence-based medicine"
+chunk: 3/6
+source: "https://en.wikipedia.org/wiki/Evidence-based_medicine"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:04.124595+00:00"
+instance: "kb-cron"
+---
+
+== Methods ==
+
+=== Steps ===
+The steps for designing explicit, evidence-based guidelines were described in the late 1980s: formulate the question (population, intervention, comparison intervention, outcomes, time horizon, setting); search the literature to identify studies that inform the question; interpret each study to determine precisely what it says about the question; if several studies address the question, synthesize their results (meta-analysis); summarize the evidence in evidence tables; compare the benefits, harms and costs in a balance sheet; draw a conclusion about the preferred practice; write the guideline; write the rationale for the guideline; have others review each of the previous steps; implement the guideline.
+For the purposes of medical education and individual-level decision making, five steps of EBM in practice were described in 1992 and the experience of delegates attending the 2003 Conference of Evidence-Based Health Care Teachers and Developers was summarized into five steps and published in 2005. This five-step process can broadly be categorized as follows:
+
+Translation of uncertainty to an answerable question; includes critical questioning, study design and levels of evidence
+Systematic retrieval of the best evidence available
+Critical appraisal of evidence for internal validity that can be broken down into aspects regarding:
+Systematic errors as a result of selection bias, information bias and confounding
+Quantitative aspects of diagnosis and treatment
+The effect size and aspects regarding its precision
+Clinical importance of results
+External validity or generalizability
+Application of results in practice
+Evaluation of performance
+
+=== Evidence reviews ===
+Systematic reviews of published research studies are a major part of the evaluation of particular treatments. The Cochrane Collaboration is one of the best-known organisations that conducts systematic reviews. Like other producers of systematic reviews, it requires authors to provide a detailed study protocol as well as a reproducible plan of their literature search and evaluations of the evidence. After the best evidence is assessed, treatment is categorized as (1) likely to be beneficial, (2) likely to be harmful, or (3) without evidence to support either benefit or harm.
+A 2007 analysis of 1,016 systematic reviews from all 50 Cochrane Collaboration Review Groups found that 44% of the reviews concluded that the intervention was likely to be beneficial, 7% concluded that the intervention was likely to be harmful, and 49% concluded that evidence did not support either benefit or harm. 96% recommended further research. In 2017, a study assessed the role of systematic reviews produced by Cochrane Collaboration to inform US private payers' policymaking; it showed that although the medical policy documents of major US private payers were informed by Cochrane systematic reviews, there was still scope to encourage the further use.
+
+=== Assessing the quality of evidence ===
+
+Evidence-based medicine categorizes different types of clinical evidence and rates or grades them according to the strength of their freedom from the various biases that beset medical research. For example, the strongest evidence for therapeutic interventions is provided by systematic review of randomized, well-blinded, placebo-controlled trials with allocation concealment and complete follow-up involving a homogeneous patient population and medical condition. In contrast, patient testimonials, case reports, and even expert opinion have little value as proof because of the placebo effect, the biases inherent in observation and reporting of cases, and difficulties in ascertaining who is an expert (however, some critics have argued that expert opinion "does not belong in the rankings of the quality of empirical evidence because it does not represent a form of empirical evidence" and continue that "expert opinion would seem to be a separate, complex type of knowledge that would not fit into hierarchies otherwise limited to empirical evidence alone.").
+Several organizations have developed grading systems for assessing the quality of evidence. For example, in 1989 the U.S. Preventive Services Task Force (USPSTF) put forth the following system:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-based_medicine-3.md b/data/en.wikipedia.org/wiki/Evidence-based_medicine-3.md
new file mode 100644
index 000000000..c2232483a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-based_medicine-3.md
@@ -0,0 +1,38 @@
+---
+title: "Evidence-based medicine"
+chunk: 4/6
+source: "https://en.wikipedia.org/wiki/Evidence-based_medicine"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:04.124595+00:00"
+instance: "kb-cron"
+---
+
+Level I: Evidence obtained from at least one properly designed randomized controlled trial.
+Level II-1: Evidence obtained from well-designed controlled trials without randomization.
+Level II-2: Evidence obtained from well-designed cohort studies or case-control studies, preferably from more than one center or research group.
+Level II-3: Evidence obtained from multiple time series designs with or without the intervention. Dramatic results in uncontrolled trials might also be regarded as this type of evidence.
+Level III: Opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.
+Another example are the Oxford CEBM Levels of Evidence published by the Centre for Evidence-Based Medicine. First released in September 2000, the Levels of Evidence provide a way to rank evidence for claims about prognosis, diagnosis, treatment benefits, treatment harms, and screening, which most grading schemes do not address. The original CEBM Levels were Evidence-Based On Call to make the process of finding evidence feasible and its results explicit. In 2011, an international team redesigned the Oxford CEBM Levels to make them more understandable and to take into account recent developments in evidence ranking schemes. The Oxford CEBM Levels of Evidence have been used by patients and clinicians, as well as by experts to develop clinical guidelines, such as recommendations for the optimal use of phototherapy and topical therapy in psoriasis and guidelines for the use of the BCLC staging system for diagnosing and monitoring hepatocellular carcinoma in Canada.
+In 2000, a system was developed by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group. The GRADE system takes into account more dimensions than just the quality of medical research. It requires users who are performing an assessment of the quality of evidence, usually as part of a systematic review, to consider the impact of different factors on their confidence in the results. Authors of GRADE tables assign one of four levels to evaluate the quality of evidence, on the basis of their confidence that the observed effect (a numeric value) is close to the true effect. The confidence value is based on judgments assigned in five different domains in a structured manner. The GRADE working group defines 'quality of evidence' and 'strength of recommendations' based on the quality as two different concepts that are commonly confused with each other.
+Systematic reviews may include randomized controlled trials that have low risk of bias, or observational studies that have high risk of bias. In the case of randomized controlled trials, the quality of evidence is high but can be downgraded in five different domains.
+
+Risk of bias: A judgment made on the basis of the chance that bias in included studies has influenced the estimate of effect.
+Imprecision: A judgment made on the basis of the chance that the observed estimate of effect could change completely.
+Indirectness: A judgment made on the basis of the differences in characteristics of how the study was conducted and how the results are actually going to be applied.
+Inconsistency: A judgment made on the basis of the variability of results across the included studies.
+Publication bias: A judgment made on the basis of the question whether all the research evidence has been taken to account.
+In the case of observational studies per GRADE, the quality of evidence starts off lower and may be upgraded in three domains in addition to being subject to downgrading.
+
+Large effect: Methodologically strong studies show that the observed effect is so large that the probability of it changing completely is less likely.
+Plausible confounding would change the effect: Despite the presence of a possible confounding factor that is expected to reduce the observed effect, the effect estimate still shows significant effect.
+Dose response gradient: The intervention used becomes more effective with increasing dose. This suggests that a further increase will likely bring about more effect.
+Meaning of the levels of quality of evidence as per GRADE:
+
+High Quality Evidence: The authors are very confident that the presented estimate lies very close to the true value. In other words, the probability is very low that further research will completely change the presented conclusions.
+Moderate Quality Evidence: The authors are confident that the presented estimate lies close to the true value, but it is also possible that it may be substantially different. In other words, further research may completely change the conclusions.
+Low Quality Evidence: The authors are not confident in the effect estimate, and the true value may be substantially different. In other words, further research is likely to change the presented conclusions completely.
+Very Low Quality Evidence: The authors do not have any confidence in the estimate and it is likely that the true value is substantially different from it. In other words, new research will probably change the presented conclusions completely.
+
+=== Categories of recommendations ===
+In guidelines and other publications, recommendation for a clinical service is classified by the balance of risk versus benefit and the level of evidence on which this information is based. The U.S. Preventive Services Task Force uses the following system:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-based_medicine-4.md b/data/en.wikipedia.org/wiki/Evidence-based_medicine-4.md
new file mode 100644
index 000000000..2d24f8ff3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-based_medicine-4.md
@@ -0,0 +1,36 @@
+---
+title: "Evidence-based medicine"
+chunk: 5/6
+source: "https://en.wikipedia.org/wiki/Evidence-based_medicine"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:04.124595+00:00"
+instance: "kb-cron"
+---
+
+Level A: Good scientific evidence suggests that the benefits of the clinical service substantially outweigh the potential risks. Clinicians should discuss the service with eligible patients.
+Level B: At least fair scientific evidence suggests that the benefits of the clinical service outweighs the potential risks. Clinicians should discuss the service with eligible patients.
+Level C: At least fair scientific evidence suggests that the clinical service provides benefits, but the balance between benefits and risks is too close for general recommendations. Clinicians need not offer it unless individual considerations apply.
+Level D: At least fair scientific evidence suggests that the risks of the clinical service outweigh potential benefits. Clinicians should not routinely offer the service to asymptomatic patients.
+Level I: Scientific evidence is lacking, of poor quality, or conflicting, such that the risk versus benefit balance cannot be assessed. Clinicians should help patients understand the uncertainty surrounding the clinical service.
+GRADE guideline panelists may make strong or weak recommendations on the basis of further criteria. Some of the important criteria are the balance between desirable and undesirable effects (not considering cost), the quality of the evidence, values and preferences and costs (resource utilization).
+Despite the differences between systems, the purposes are the same: to guide users of clinical research information on which studies are likely to be most valid. However, the individual studies still require careful critical appraisal
+
+=== Statistical measures ===
+Evidence-based medicine attempts to express clinical benefits of tests and treatments using mathematical methods. Tools used by practitioners of evidence-based medicine include:
+
+Likelihood ratio  The pre-test odds of a particular diagnosis, multiplied by the likelihood ratio, determines the post-test odds. (Odds can be calculated from, and then converted to, the [more familiar] probability.) This reflects Bayes' theorem. The differences in likelihood ratio between clinical tests can be used to prioritize clinical tests according to their usefulness in a given clinical situation.
+AUC-ROC The area under the receiver operating characteristic curve (AUC-ROC) reflects the relationship between sensitivity and specificity for a given test. High-quality tests will have an AUC-ROC approaching 1, and high-quality publications about clinical tests will provide information about the AUC-ROC. Cutoff values for positive and negative tests can influence specificity and sensitivity, but they do not affect AUC-ROC.
+Number needed to treat (NNT)/Number needed to harm (NNH). NNT and NNH are ways of expressing the effectiveness and safety, respectively, of interventions in a way that is clinically meaningful. NNT is the number of people who need to be treated in order to achieve the desired outcome (e.g. survival from cancer) in one patient. For example, if a treatment increases the chance of survival by 5%, then 20 people need to be treated in order for 1 additional patient to survive because of the treatment. The concept can also be applied to diagnostic tests. For example, if 1,339 women age 50–59 need to be invited for breast cancer screening over a ten-year period in order to prevent one woman from dying of breast cancer, then the NNT for being invited to breast cancer screening is 1339.
+
+=== Quality of clinical trials ===
+Evidence-based medicine attempts to objectively evaluate the quality of clinical research by critically assessing techniques reported by researchers in their publications.
+
+Trial design considerations: High-quality studies have clearly defined eligibility criteria and have minimal missing data.
+Generalizability considerations: Studies may only be applicable to narrowly defined patient populations and may not be generalizable to other clinical contexts.
+Follow-up: Sufficient time for defined outcomes to occur can influence the prospective study outcomes and the statistical power of a study to detect differences between a treatment and control arm.
+Power: A mathematical calculation can determine whether the number of patients is sufficient to detect a difference between treatment arms. A negative study may reflect a lack of benefit, or simply a lack of sufficient quantities of patients to detect a difference.
+
+== Limitations and criticism ==
+There are a number of limitations and criticisms of evidence-based medicine. Two widely cited categorization schemes for the various published critiques of EBM include the three-fold division of Straus and McAlister ("limitations universal to the practice of medicine, limitations unique to evidence-based medicine and misperceptions of evidence-based-medicine") and the five-point categorization of Cohen, Stavri and Hersh (EBM is a poor philosophic basis for medicine, defines evidence too narrowly, is not evidence-based, is limited in usefulness when applied to individual patients, or reduces the autonomy of the doctor/patient relationship).
+In no particular order, some published objections include:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence-based_medicine-5.md b/data/en.wikipedia.org/wiki/Evidence-based_medicine-5.md
new file mode 100644
index 000000000..e64fd8125
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence-based_medicine-5.md
@@ -0,0 +1,42 @@
+---
+title: "Evidence-based medicine"
+chunk: 6/6
+source: "https://en.wikipedia.org/wiki/Evidence-based_medicine"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:04.124595+00:00"
+instance: "kb-cron"
+---
+
+Research produced by EBM, such as from randomized controlled trials (RCTs), may not be relevant for all treatment situations. Research tends to focus on specific populations, but individual persons can vary substantially from population norms. Because certain population segments have been historically under-researched (due to reasons such as race, gender, age, and co-morbid diseases), evidence from RCTs may not be generalizable to those populations. Thus, EBM applies to groups of people, but this should not preclude clinicians from using their personal experience in deciding how to treat each patient. One author advises that "the knowledge gained from clinical research does not directly answer the primary clinical question of what is best for the patient at hand" and suggests that evidence-based medicine should not discount the value of clinical experience. Another author stated that "the practice of evidence-based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research."
+Use of evidence-based guidelines often fits poorly for complex, multimorbid patients. This is because the guidelines are usually based on clinical studies focused on single diseases. In reality, the recommended treatments in such circumstances may interact unfavorably with each other and often lead to polypharmacy.
+The theoretical ideal of EBM (that every narrow clinical question, of which hundreds of thousands can exist, would be answered by meta-analysis and systematic reviews of multiple RCTs) faces the limitation that research (especially the RCTs themselves) is expensive; thus, in reality, for the foreseeable future, the demand for EBM will always be much higher than the supply, and the best humanity can do is to triage the application of scarce resources.
+Research can be influenced by biases such as political or belief bias, publication bias and conflict of interest in academic publishing. For example, studies with conflicts due to industry funding are more likely to favor their product. It has been argued that contemporary evidence based medicine is an illusion, since evidence based medicine has been corrupted by corporate interests, failed regulation, and commercialisation of academia.
+Systematic Reviews methodologies are capable of bias and abuse in respect of (i) choice of inclusion criteria (ii) choice of outcome measures, comparisons and analyses (iii) the subjectivity inevitable in Risk of Bias assessments, even when codified procedures and criteria are observed. An example of all these problems can be seen in a Cochrane Review.
+A lag exists between when the RCT is conducted and when its results are published.
+A lag exists between when results are published and when they are properly applied.
+Hypocognition (the absence of a simple, consolidated mental framework into which new information can be placed) can hinder the application of EBM.
+Values: while patient values are considered in the original definition of EBM, the importance of values is not commonly emphasized in EBM training, a potential problem under current study.
+A 2018 study, "Why all randomised controlled trials produce biased results", assessed the 10 most cited RCTs and argued that trials face a wide range of biases and constraints, from trials only being able to study a small set of questions amenable to randomisation and generally only being able to assess the average treatment effect of a sample, to limitations in extrapolating results to another context, among many others outlined in the study.
+
+== Application of evidence in clinical settings ==
+
+Despite the emphasis on evidence-based medicine, unsafe or ineffective medical practices may occur. Contributing factors include clinicians not keeping up with or acting on current evidence, the rapid pace of scientific change, financial incentives, and patient demand for tests or treatments. Even when the evidence unequivocally shows that a treatment is either not safe or ineffective, it may take many years for other treatments to be adopted.
+Several factors may contribute to lack of uptake or implementation of evidence-based recommendations. These include lack of awareness at the individual clinician or patient (micro) level, lack of institutional support at the organisation level (meso) level or higher at the policy (macro) level. In other cases, significant change can require a generation of physicians to be replaced by physicians who were trained with more recent evidence.
+Revision of clinical guidelines to include an implementation plan may facilitate uptake of new procedures, including analysis of the context, identifying barriers and facilitators, and designing strategies to address them.
+
+== Education ==
+Training in evidence based medicine is offered across the continuum of medical education. Educational competencies have been created for the education of health care professionals.
+The Berlin questionnaire and the Fresno Test are validated instruments for assessing the effectiveness of education in evidence-based medicine. These questionnaires have been used in diverse settings.
+A Campbell systematic review that included 24 trials examined the effectiveness of e-learning in improving evidence-based health care knowledge and practice. It was found that e-learning, compared to no learning, improves evidence-based health care knowledge and skills but not attitudes and behaviour. No difference in outcomes is present when comparing e-learning with face-to-face learning. Combining e-learning and face-to-face learning (blended learning) has a positive impact on evidence-based knowledge, skills, attitude and behavior. As a form of e-learning, some medical school students engage in editing Wikipedia to increase their EBM skills, and some students construct EBM materials to develop their skills in communicating medical knowledge.
+
+== See also ==
+
+== References ==
+
+== Bibliography ==
+
+== External links ==
+
+Evidence-Based Medicine – An Oral History, JAMA and the BMJ, 2014. 
+Centre for Evidence-based Medicine at the University of Oxford.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-0.md b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-0.md
new file mode 100644
index 000000000..990ee367b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-0.md
@@ -0,0 +1,33 @@
+---
+title: "Evidence and documentation for the Holocaust"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:11.488454+00:00"
+instance: "kb-cron"
+---
+
+The Holocaust—the systematic killing of about six million Jews by Nazi Germany from 1941 to 1945—is the most documented genocide in history. Although there is no single document which lists the names of all Jewish victims of Nazi persecution, there is conclusive evidence that about six million Jews were murdered. There is also conclusive evidence that Jews were gassed at Auschwitz-Birkenau, the Operation Reinhard extermination camps, and in gas vans, and that there was a systematic plan by the Nazi leadership to murder them.
+Evidence for the Holocaust comes in four main varieties:
+
+Contemporary documents, including a wide variety of "letters, memos, blueprints, orders, bills, speeches"; Holocaust train schedules and statistical summaries generated by the SS; and photographs, including official photographs, clandestine photographs by survivors, aerial photographs, and film footage of the liberation of the camps. More than 3,000 tons of records were collected for the Nuremberg trials.
+Later testimony from tens of thousands of eyewitnesses, including survivors such as Sonderkommandos, who directly witnessed the extermination process; perpetrators such as Nazi leaders, SS guards, and Nazi concentration camp commandants; and local townspeople. Moreover, virtually none of the perpetrators put on trial denied the reality of the systematic murder, with the most common excuse (where one was given) being that they were just following orders.
+Material evidence in the form of concentration and extermination camps, which still exist with various amounts of the original structure preserved, and thousands of mass graves containing the corpses of Holocaust victims.
+Circumstantial evidence: during World War II, the population of Jews in German-occupied Europe was reduced by about six million. About 2.7 million Jews were deported to Auschwitz-Birkenau, Kulmhof extermination camp, and the Operation Reinhard camps never to be seen or heard from again.
+The perpetrators attempted to avoid creating explicit evidence and they also tried to destroy the documentary and material evidence of their crimes before the German defeat. Nevertheless, much of the evidence was preserved and collected by Allied investigators during and after the war, and the overwhelming evidence of the crimes ultimately made such erasure attempts futile. Collectively, the evidence refutes the arguments of Holocaust deniers that the Holocaust did not occur as described in historical scholarship.
+
+== Hitler's involvement ==
+
+=== Policy ===
+
+Historians, including Ian Kershaw, Raul Hilberg, and Martin Broszat, indicate that no document exists showing that Hitler ordered the Holocaust. However, other evidence makes clear that Hitler knew about and ordered the genocide. Statements from top-ranking Nazis such as Adolf Eichmann, Joseph Goebbels, and Heinrich Himmler also indicate that Hitler orchestrated the Holocaust and statements from Hitler himself reveal his genocidal intentions toward Jewry.
+
+=== Order and responsibility ===
+
+In a draft of an internal memorandum, dated 18 September 1942, Reichsfuhrer SS Heinrich Himmler wrote that "in principle the Fuehrer's time is no longer to be burdened with these matters"; the memorandum goes on to outline Himmler's vision, including "The delivery of anti-social elements from the execution of their sentences to the Reich Fuehrer of SS to be worked to death. Persons under protective arrest, Jews, Gypsies, Russians and Ukrainians, Poles with more than 3-year sentences, Czechs and Germans with more than 8-year sentences according to the judgement of the Minister of Justice [Thierack]. First of all, the worst anti-social elements amongst those just mentioned are to be handed over; I shall inform the Fuhrer of this through Reichsleiter Bormann."
+Nevertheless, and in contrast to the T4 euthanasia program, no document written or signed by Hitler ordering the Holocaust has ever been found. Deniers have claimed that this lack of order shows genocide was not Nazi policy.
+During David Irving's unsuccessful libel action against Deborah Lipstadt, he indicated that he considered a document signed by Hitler ordering the 'Final Solution' would be the only convincing proof of Hitler's responsibility. He was, however, described as content to accuse Winston Churchill of responsibility for ordering the assassination of General Sikorski, despite having no documentary evidence to support his claim. Mr Justice Gray concluded that this was a double standard.
+Historians have documented evidence that as Germany's defeat became imminent and the Nazi leaders realized that they would most likely be captured and brought to trial, a great effort to destroy all of the evidence of mass extermination was made. In the spring of 1942, Himmler ordered all of the traces of murdered Russian Jews and all of the traces of murdered prisoners of war to be removed from the occupied territories of the Soviet Union.  As one of many examples, the bodies of the 25,000 mostly Latvian Jews whom Friedrich Jeckeln and the soldiers under his command had shot at Rumbula (near Riga) in late 1941 were dug up and burned in 1943.
+In mid-1942, Reinhard Heydrich, through Heinrich Mueller, Chief of the Gestapo, ordered Paul Blobel in Sonderaktion 1005 to remove all traces of the mass executions in the East carried out by the Einsatzgruppen. After Blobel and his staff developed a special incineration process, destruction of evidence at Belzec and Sobibor followed in late 1942. In February 1943, Himmler personally visited Treblinka and ordered the commandants to destroy records, crematoria, and other signs of mass extermination.
+In the Posen speeches of October 1943, Himmler explicitly referred to the extermination of the Jews of Europe and further stated that the genocide must be permanently kept secret. On 4 October, he said:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-1.md b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-1.md
new file mode 100644
index 000000000..3a94df28f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-1.md
@@ -0,0 +1,28 @@
+---
+title: "Evidence and documentation for the Holocaust"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:11.488454+00:00"
+instance: "kb-cron"
+---
+
+I also want to refer here very frankly to a very difficult matter. We can now very openly talk about this among ourselves, and yet we will never discuss this publicly. Just as we did not hesitate on June 30, 1934, to perform our duty as ordered and put comrades who had failed up against the wall and execute them, we also never spoke about it, nor will we ever speak about it. Let us thank God that we had within us enough self-evident fortitude never to discuss it among us, and we never talked about it. Every one of us was horrified, and yet every one clearly understood that we would do it next time, when the order is given and when it becomes necessary.
+I am now referring to the evacuation of the Jews, to the extermination of the Jewish people.
+Historian Peter Longerich states that Hitler "avoided giving a clear written order to exterminate Jewish civilians". Wide protest was evoked when Hitler's authorisation of the T4 program became public knowledge in Germany, and he was forced to put a halt to it as a result (nonetheless it continued discreetly). This made Hitler realise that such undertakings must be done secretly in order to avoid criticism. Critics also point out that if Hitler did sign such an order in the first place, it would have been one of the first documents to be destroyed. 
+Evidence of a verbal order from Hitler includes a handwritten note by Himmler on a meeting with Hitler at the Wolfsschanze on 18 December 1941, which read: "Jewish Question; to be exterminated as partisans". Historians have argued that this indicates Hitler gave a verbal order to Himmler at this meeting for the Einsatzgruppen to target Jews under the guise of anti-partisan warfare.
+According to Felix Kersten's memoirs, Himmler told him that the extermination of the Jews was expressly ordered by Hitler and had been delegated to Himmler.
+
+==== According to Nazis ====
+Many statements from the Nazis from 1941 onwards addressed the imminent extermination of the Jews.
+In a draft of an internal memorandum, dated 25 October 1941, Heinrich Himmler wrote:
+
+ As the affairs now stand, there are no objections against doing away with those Jews who are not able to work, with the Brack remedy.
+Joseph Goebbels had frequent discussions with Hitler about the fate of the Jews, a subject which they discussed almost every time they met, and frequently wrote about it in his personal diary. In his personal diary he wrote:
+
+14 February 1942: "The Führer once again expressed his determination to clean up the Jews in Europe pitilessly. There must be no squeamish sentimentalism about it. The Jews have deserved the catastrophe that has now overtaken them. Their destruction will go hand in hand with the destruction of our enemies. We must hasten this process with cold ruthlessness."
+27 March 1942: "A judgment is being visited upon the Jews that, while barbaric, is fully deserved by them. The prophecy which the Führer made about them for having brought on a new world war is beginning to come true in a most terrible manner. One must not be sentimental in these matters. If we did not fight the Jews, they would destroy us. It's a life-and-death struggle between the Aryan race and the Jewish bacillus."
+On 16 November 1941, Goebbels published an article "The Jews are to blame" which returned to Hitler's prophecy of 1939 and stated that world Jewry was suffering a "gradual process of extermination". Goebbels wrote: "Some six million Jews still live in the East, and this question can only be solved by a biological extermination of the whole of Jewry in Europe".
+On 13 March 1945, Goebbels wrote in his diary that the "rest of the world" should follow Germany's example in "destroying the Jews", he wrote also about how the Jews in Germany at that point had been almost totally destroyed. This diary contains numerous other references to the mass extermination of Jews, including how "tens of thousands of them are liquidated" in eastern occupied territory, and that "the greater the number of Jews liquidated, the more consolidated will the situation in Europe be after this war." When speaking about this document under oath, David Irving is quoted as saying "There is no explicit reference...to the liquidation of Jews" and critics of Holocaust denial consequently state that it is dishonest to say such a thing when it is entirely contradicted by the diary of one of Hitler's closest associates.
+When questioned by interrogators if orders for the extermination of Jews were delegated in writing by Himmler, Adolf Eichmann states:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-2.md b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-2.md
new file mode 100644
index 000000000..1f10ee31b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-2.md
@@ -0,0 +1,36 @@
+---
+title: "Evidence and documentation for the Holocaust"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:11.488454+00:00"
+instance: "kb-cron"
+---
+
+I never saw a written order, Herr Hauptmann. All I know is that Heydrich said to me: "The Führer has ordered the physical extermination of the Jews." He said that as clearly and surely as I'm repeating it now.
+Critics state that Eichmann gives a virtually identical account of this in his memoirs, and state that it is also asserted that Eichmann never even asked for a written order, on the basis that "Hitler's wish as expressed through Himmler and Heydrich was good enough for him". Eichmann's memoirs were recorded by Willem Sassen before he was captured, and Eichmann's lawyer tried to prevent them from being presented as evidence to avoid any detriment against his case.
+In a speech, David Irving states that Heydrich told Eichmann, "The Führer has given the order for the physical destruction of the Jews". Irving admits that this contradicts his view that "Hitler wasn't involved", but explains it by suggesting that a completely different meaning can be construed, i.e. "the extirpation of Judaism" as opposed to the physical destruction of Jews if one changes "just one or two words". Critics of this view state that historians should not change words if their documents contradict their claims, and consequently point out five instances where Eichmann unambiguously states "physical extermination" during his interrogation.
+At a conference in 1941 discussing the Jewish Question, Alfred Rosenberg said:
+
+Some six million Jews still live in the East, and this question can only be solved by a biological extermination of the whole of Jewry in Europe. The Jewish Question will only be solved for Germany when the last Jew has left German territory, and for Europe when not a single Jew stands on the European continent as far as the Urals... And to this end it is necessary to force them beyond the Urals or otherwise bring about their eradication.
+At the Einsatzgruppen Trial in 1947, SS-Obersturmbannfuhrer Martin Sandberger recalled that his superior, SS-Gruppenfuhrer Bruno Streckenbach, had informed him and other Einsatzgruppen commanders of an order from Hitler to eliminate all Jews in the Eastern Territories at a meeting at the Palais Prinz Albrecht in 1941.
+Rudolf Höss, commandant of the Auschwitz concentration camp, wrote a series of memoirs about his role in the Holocaust while awaiting execution after the war. In these memoirs Höss stated that Himmler had briefed him about the Final Solution and his role in it in summer 1941; during the meeting, Himmler told him that the order for the Final Solution came directly from Hitler.
+
+=== Awareness ===
+
+Congruent with the evidence that shows Hitler was responsible for the order to murder Jews, there is also evidence that shows he was made aware of the process. Gestapo Chief Heinrich Müller sent a telegram on 2 August 1941, ordering that "especially interesting illustrative" material should be sent to Berlin because, "the Führer should be presented with continuous reports on the work of Einsatzgruppen in the East from here". At the end of December 1942 Hitler received a document from Himmler entitled, "Report to the Führer on Combating Partisans", stating that 363,211 Jews had been murdered by the Einsatzgruppen in August–November 1942. This document was specifically printed in large font that Hitler could read without glasses and was marked "Shown to the Führer".
+
+== Himmler's speeches ==
+
+Critics of Holocaust denial state that the claim by deniers of no Nazi plan to exterminate the Jews is discredited by Himmler in a speech made on 4 October 1943 to a gathering of SS officers in Poznań, where he said:
+
+In a speech at Sonthofen on 24 May 1944, Himmler said to a group of German generals:
+I believe, gentlemen, that you know me well enough to realize that I am not a bloodthirsty man nor a man who takes pleasure or finds sport in the harsher things he must do. On the other hand, I have strong nerves and a great sense of duty—if I do say so myself—and when I recognize the necessity to do something, I will do it unflinchingly. As to the Jewish women and children, I did not believe I had a right to let these children grow up to become avengers who would kill our fathers [sic] and grandchildren. That, I thought, would be cowardly.
+
+== Use of gas chambers ==
+The German firm Topf and Sons manufactured gas chambers to be used in concentration camps for extermination.
+
+Despite the difficulty of finding traces of this material, in February 1990, Professor Jan Markiewicz, Director of the Institute of Forensic Research in Kraków, redid the analysis. Markiewicz and his team used microdiffusion techniques to test for cyanide in samples from the suspected gas chambers, from delousing chambers, and from control areas elsewhere within Auschwitz. The control samples tested negative, while cyanide residue was found in high concentrations in the delousing chambers, and lower concentrations in the homicidal gas chambers. This is consistent with the amounts required to kill lice and humans.
+The search for cyanide in the bricks of buildings said to have been gas chambers was important, because the pesticide Zyklon B would generate such a residue. This was the gas most often cited as the murder instrument for prisoners in the gas chambers, supported by both testimony and evidence collected of Nazi policy.
+Another claim made by Holocaust deniers is that there were no specially-constructed vents in the gas chambers through which Zyklon B could be released. The BBC offers a response showing that this requires disregard of much documentation:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-3.md b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-3.md
new file mode 100644
index 000000000..0e34a2990
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-3.md
@@ -0,0 +1,31 @@
+---
+title: "Evidence and documentation for the Holocaust"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:11.488454+00:00"
+instance: "kb-cron"
+---
+
+Deniers have said for years that physical evidence is lacking because they have seen no holes in the roof of the Birkenau gas chamber where the Zyklon was poured in. (In some of the gas chambers the Zyklon B was poured in through the roof, while in others it was thrown in through the windows.) The roof was dynamited at war's end, and today lies broken in pieces, but three of the four original holes were positively identified in a recent paper. Their location in the concrete matches with eyewitness testimony, aerial photos from 1944, and a ground photo from 1943. The physical evidence shows unmistakably that the Zyklon holes were cast into the concrete when the building was constructed.
+Deniers also claim that the doors of gas chambers, some of which were made out of wood, were not airtight enough for the chambers to have worked correctly, assertion that has been thrououghly debunked.
+
+Cremation in the open at the Reinhard extermination camps (Treblinka, Sobibor and Belzec) was discussed at Nuremberg on 7 April 1946 by Georg Konrad Morgen, SS judge and lawyer who investigated crimes committed in Nazi concentration camps. He stated: "The whole thing was like an assembly line. At the last stop they reached a big room, and were told that this was the bath. When the last one was in, the doors were shut and the gas was let into the room. As soon as death taken place in (sic), the ventilators were started. When the air was breathable, the doors were opened, and the Jewish workers removed the bodies. By means of a special process which Wirth had invented, they were burned in the open air without the use of fuel."
+There is well-documented evidence that other ash was used as fertilizer in nearby fields. Photographs of Treblinka taken by the camp commandant show what looks to be ash piles being distributed by steam shovels.
+The Nizkor Project and other sources have stated that the minimal concentration of Zyklon B to be explosive is 56,000 parts per million, while 300 parts per million is fatal to humans, as is evidenced in The Merck Index and the CRC Handbook of Chemistry and Physics. In fact, the Nazis' own documentation stated "Danger of explosion: 75 grams of HCN in 1 cubic meter of air. Normal application approx. 8–10 grams per cubic meter, therefore not explosive."
+The Institute for Historical Review publicly offered a reward of $50,000 for verifiable "proof that gas chambers for the purpose of killing human beings existed at or in Auschwitz." Mel Mermelstein, a survivor of Auschwitz, submitted his own testimony as proof but it was ignored. He then sued IHR in the United States and the case was subsequently settled for $50,000, plus $40,000 in damages for personal suffering. The court declared the statement that "Jews were gassed to death at the Auschwitz Concentration Camp in Poland during the summer of 1944" was a fact.
+
+== Victims ==
+
+=== Six million ===
+
+The vast majority of scholars, institutions, and one Nazi official estimate between five and six million Jews perished during the Holocaust. With approximately 4.5 million Jewish victims' names collected by Yad Vashem, numerous documents and archives discovered after the war gave meticulous accounts of the exterminations that took place at the death camps (such as Auschwitz and Treblinka).
+
+=== Jewish population ===
+The 1932 American Jewish Yearbook estimates the total number of Jews in the world at 15,192,218, of whom 9,418,248 resided in Europe. However, the 1947 yearbook states: "Estimates of the world Jewish population have been assembled by the American Jewish Joint Distribution Committee (except for the United States and Canada) and are probably the most authentic available at the present time. The figures reveal that the total Jewish population of the world has decreased by one-third from about 16,600,000 in 1939 to about 11,000,000 in 1946 as the result of the annihilation by the Nazis of more than five and a half million European Jews. In Europe only an estimated 3,642,000 remain of the total Jewish pre-war population of approximately 9,740,000." These numbers are also consistent with the findings of the Anglo-American Committee of Inquiry, Appendix III, in 1946.
+
+== Nazi documentation ==
+
+The Nazis used figures of between 9 and 11 million for the Jewish population of Europe, as evidenced in the notes of the Wannsee Conference. In fact, the Nazis methodically recorded the ongoing reduction of the Jewish population, as in the Korherr Report, which gave the status of the Final Solution through December 1942. The Höfle Telegram was sent by Hermann Höfle on 11 January 1943 to Adolf Eichmann in Berlin and detailed the number of Jews murdered in the concentration camps. In the year 1942 alone, the telegram lists 1,274,166 Jews were exterminated in the four camps of Aktion Reinhard.
+The Korherr Report, compiled by an SS statistician, gave a conservative total of 2,454,000 Jews deported to extermination camps or murdered by the Einsatzgruppen. The complete status reports of the Einsatzgruppen death squads were found in the archives of the Gestapo when it was searched by the U.S. Army, and the accuracy attested to by the former Einsatzgruppen members who testified during war crime trials and at other times. These reports alone list an additional 1,500,000 or so murders during mass shootings, the vast majority of these victims were Jews. Further, surviving Nazi documentation spells out their plans to murder the Jews of Europe (see the Wannsee Conference), recorded the trains arriving at various death camps, and included photographs and films of many atrocities.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-4.md b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-4.md
new file mode 100644
index 000000000..ee9d27bdc
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust-4.md
@@ -0,0 +1,25 @@
+---
+title: "Evidence and documentation for the Holocaust"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Evidence_and_documentation_for_the_Holocaust"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:11.488454+00:00"
+instance: "kb-cron"
+---
+
+== Testimonies ==
+There is a voluminous amount of testimony from tens of thousands of survivors of the Holocaust, as well as the testimony of captured Nazi officers at the Nuremberg Trials and other times. Höss's testimony did not consist of merely a signed confession; while in jail he also wrote two volumes of memoirs and gave extensive testimony outside of the Nuremberg proceedings. Further, his testimony agrees with that of other contemporary written accounts by Auschwitz officials, such as Pery Broad, an SS man stationed at Auschwitz while Höss was the commandant, and the diary kept by SS physician at Auschwitz Johann Kremer, as well as the testimony of hundreds of camp guards and victims. Auschwitz guard Reinhold Hanning even testified that it was common knowledge among camp personnel that "the majority of people who arrived in the trains were killed". In addition, former SS personnel have criticised Holocaust denial. SS-Oberscharführer Josef Klehr said that anyone who maintains that nobody was gassed at Auschwitz must be "crazy or on the wrong". SS-Unterscharführer Oswald Kaduk stated that he did not consider those who maintain such a thing as normal people. Karl Frenzel, a senior officer at the Sobibor extermination camp, stated in a 1983 interview that "It is wrong to say that it never happened" in reference to Jews being gassed at the camp. Hearing about Holocaust denial compelled former SS-Rottenführer Oskar Gröning to publicly speak about what he witnessed at Auschwitz, and denounce Holocaust deniers, stating:
+
+I would like you to believe me. I saw the gas chambers. I saw the crematoria. I saw the open fires. I was on the ramp when the selections took place. I would like you to believe that these atrocities happened because I was there.
+Hans Münch, a former SS physician, signed a document certifying what he witnessed at Auschwitz: "thousands of people gassed", and the usage of Zyklon B in gas chambers. According to Münch's estimation, prisoners died within three to five minutes of exposure to Zyklon B. In an interview on Swedish television in 1981 Münch described the extermination process in detail and confirmed that "special treatment" in the context of Auschwitz referred to physical extermination.
+During Fedorenko v. United States, a deportation case involving former Treblinka guard Feodor Fedorenko, he testified that he had been stationed in a guard tower overlooking the camp and admitted that the gas chambers were visible from this vantage point and that he had witnessed dead bodies being removed from the gas chambers on multiple occasions.
+In the 1983 Holocaust documentary Shoah, Unterscharführer Franz Suchomel, tricked into an interview with false promises of anonymity, described his time at the Treblinka extermination camp. Suchomel related to the interviewer, Claude Lanzmann, how he saw dead bodies being removed from the gas chambers during a tour of the camp before explaining in depth the extermination of Jews at the camp through both gassing and shooting.
+Sonderkommandos provide another key piece of testimony. These were Jewish prisoners who helped march Jews to the gas chambers, and later dragged the bodies to the crematoria. Since they witnessed the entire process, their testimony is vital in confirming that the gas chambers were used for murderous purposes and the scale to which they were used.
+Other key testimony comes from non-Jewish survivors of the camps such as Catholic French Resistance member André Rogerie who was held in seven different camps, and who as a member of the Resistance was not targeted for extermination but for hard labor and survived.  After the war Rogerie wrote and testified extensively about his experiences in the camps including Auschwitz-Birkenau, where he viewed and produced the oldest contemporary sketch of a camp crematorium.
+
+== References ==
+
+=== Citations ===
+
+=== Sources ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-0.md b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-0.md
new file mode 100644
index 000000000..5c16f0a10
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-0.md
@@ -0,0 +1,26 @@
+---
+title: "Evidence and efficacy of homeopathy"
+chunk: 1/6
+source: "https://en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:01.587272+00:00"
+instance: "kb-cron"
+---
+
+The infinitesimally low concentration of homeopathic preparations, which often lack even a single molecule of the diluted substance, has been the basis of questions about the effects of the preparations since the 19th century. Modern advocates of homeopathy have proposed a concept of "water memory", according to which water "remembers" the substances mixed in it, and transmits the effect of those substances when consumed. This concept is inconsistent with the current understanding of matter, and water memory has never been demonstrated to exist, in terms of any detectable effect, biological or otherwise.
+James Randi and the 10:23 campaign groups have highlighted the lack of active ingredients in most homeopathic products by taking large 'overdoses'. None of the hundreds of demonstrators in the UK, Australia, New Zealand, Canada and the US were injured and "no one was cured of anything, either".
+Outside of the alternative medicine community, scientists have long considered homeopathy a sham or a pseudoscience, and the mainstream medical community regards it as quackery. There is an overall absence of sound statistical evidence of therapeutic efficacy, which is consistent with the lack of any biologically plausible pharmacological agent or mechanism.
+Abstract concepts within theoretical physics have been invoked to suggest explanations of how or why preparations might work, including quantum entanglement, quantum nonlocality, the theory of relativity and chaos theory. Contrariwise, quantum superposition has been invoked to explain why homeopathy does not work in double-blind trials. However, the explanations are offered by nonspecialists within the field, and often include speculations that are incorrect in their application of the concepts and not supported by actual experiments. Several of the key concepts of homeopathy conflict with fundamental concepts of physics and chemistry. The use of quantum entanglement to explain homeopathy's purported effects is "patent nonsense", as entanglement is a delicate state that rarely lasts longer than a fraction of a second. While entanglement may result in certain aspects of individual subatomic particles acquiring linked quantum states, this does not mean the particles will mirror or duplicate each other, nor cause health-improving transformations.
+
+== Plausibility ==
+The proposed mechanisms for homeopathy are precluded from having any effect by the laws of physics and physical chemistry. The extreme dilutions used in homeopathic preparations usually leave not one molecule of the original substance in the final product.
+A number of speculative mechanisms have been advanced to counter this, the most widely discussed being water memory, though this is now considered erroneous since short-range order in water only persists for about 1 picosecond. No evidence of stable clusters of water molecules was found when homeopathic preparations were studied using nuclear magnetic resonance, and many other physical experiments in homeopathy have been found to be of low methodological quality, which precludes any meaningful conclusion. Existence of a pharmacological effect in the absence of any true active ingredient is inconsistent with the law of mass action and the observed dose-response relationships characteristic of therapeutic drugs (whereas placebo effects are non-specific and unrelated to pharmacological activity).
+Homeopaths contend that their methods produce a therapeutically active preparation, selectively including only the intended substance, though critics note that any water will have been in contact with millions of different substances throughout its history, and homeopaths have not been able to account for a reason why only the selected homeopathic substance would be a special case in their process. For comparison, ISO 3696:1987 defines a standard for water used in laboratory analysis; this allows for a contaminant level of ten parts per billion, 4C in homeopathic notation. This water may not be kept in glass as contaminants will leach out into the water.
+Practitioners of homeopathy hold that higher dilutions―described as being of higher potency―produce stronger medicinal effects. This idea is also inconsistent with observed dose-response relationships, where effects are dependent on the concentration of the active ingredient in the body. This dose-response relationship has been confirmed in myriad experiments on organisms as diverse as nematodes, rats, and humans. Some homeopaths contend that the phenomenon of hormesis may support the idea of dilution increasing potency, but the dose-response relationship outside the zone of hormesis declines with dilution as normal, and nonlinear pharmacological effects do not provide any credible support for homeopathy.
+Physicist Robert L. Park, former executive director of the American Physical Society, is quoted as saying: "since the least amount of a substance in a solution is one molecule, a 30C solution would have to have at least one molecule of the original substance dissolved in a minimum of 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 [or 1060] molecules of water. This would require a container more than 30,000,000,000 times the size of the Earth." Park is also quoted as saying that, "to expect to get even one molecule of the 'medicinal' substance allegedly present in 30X pills, it would be necessary to take some two billion of them, which would total about a thousand tons of lactose plus whatever impurities the lactose contained".
+The laws of chemistry state that there is a limit to the dilution that can be made without losing the original substance altogether. This limit, which is related to the Avogadro constant, is roughly equal to homeopathic dilutions of 12C or 24X (1 part in 1024).
+Scientific tests run by both the BBC's Horizon and ABC's 20/20 programmes were unable to differentiate homeopathic dilutions from water, even when using tests suggested by homeopaths themselves.
+In May 2018, the German skeptical organization GWUP issued an invitation to individuals and groups to respond to its challenge "to identify homeopathic preparations in high potency and to give a detailed description on how this can be achieved reproducibly." The first participant to correctly identify selected homeopathic preparations under an agreed-upon protocol will receive €50,000.
+
+== Efficacy ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-1.md b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-1.md
new file mode 100644
index 000000000..480490689
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-1.md
@@ -0,0 +1,27 @@
+---
+title: "Evidence and efficacy of homeopathy"
+chunk: 2/6
+source: "https://en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:01.587272+00:00"
+instance: "kb-cron"
+---
+
+No individual homeopathic preparation has been unambiguously shown by research to be different from placebo. The methodological quality of the primary research was generally low, with such problems as weaknesses in study design and reporting, small sample size, and selection bias. Since better quality trials have become available, the evidence for efficacy of homeopathy preparations has diminished; the highest-quality trials indicate that the preparations themselves exert no intrinsic effect. A review conducted in 2010 of all the pertinent studies of "best evidence" produced by the Cochrane Collaboration concluded that "the most reliable evidence – that produced by Cochrane reviews – fails to demonstrate that homeopathic medicines have effects beyond placebo."
+
+=== Government level reviews ===
+Government-level reviews have been conducted in recent years by Switzerland (2005), the United Kingdom (2009), Australia (2015) and the European Academies' Science Advisory Council (2017).
+The Swiss programme for the evaluation of complementary medicine (PEK) resulted in the peer-reviewed Shang publication (see Systematic reviews and meta-analyses of efficacy) and a controversial competing analysis by homeopaths and advocates led by Gudrun Bornhöft and Peter Matthiessen, which has misleadingly been presented as a Swiss government report by homeopathy proponents, a claim that has been repudiated by the Swiss Federal Office of Public Health. The Swiss Government terminated reimbursement, though it was subsequently reinstated after a political campaign and referendum for a further six-year trial period.
+The United Kingdom's House of Commons Science and Technology Committee sought written evidence and submissions from concerned parties and, following a review of all submissions, concluded that there was no compelling evidence of effect other than placebo and recommended that the Medicines and Healthcare products Regulatory Agency (MHRA) should not allow homeopathic product labels to make medical claims, that homeopathic products should no longer be licensed by the MHRA, as they are not medicines, and that further clinical trials of homeopathy could not be justified. They recommended that funding of homeopathic hospitals should not continue, and NHS doctors should not refer patients to homeopaths. By February 2011 only one-third of primary care trusts still funded homeopathy and by 2012 no British universities offered homeopathy courses. In July 2017, as part of a plan to  save £200m a year by preventing the "misuse of scarce" funding, the NHS announced that it would no longer provide homeopathic medicines.  A legal appeal by the British Homeopathic Association against the decision was rejected in June 2018.
+The Australian National Health and Medical Research Council completed a comprehensive review of the effectiveness of homeopathic preparations in 2015, in which it concluded that "there were no health conditions for which there was reliable evidence that homeopathy was effective. No good-quality, well-designed studies with enough participants for a meaningful result reported either that homeopathy caused greater health improvements than placebo, or caused health improvements equal to those of another treatment."
+On September 20, 2017, the European Academies' Science Advisory Council (EASAC) published its official analysis and conclusion on the use of homeopathic products, finding a lack of evidence that homeopathic products are effective, and raising concerns about quality control.
+
+=== Publication bias and other methodological problems ===
+
+The fact that individual randomized controlled trials have given positive results is not in contradiction with an overall lack of statistical evidence of efficacy. A small proportion of randomized controlled trials inevitably provide false-positive outcomes due to the play of chance: a statistically significant positive outcome is commonly adjudicated when the probability of it being due to chance rather than a real effect is no more than 5%―a level at which about 1 in 20 tests can be expected to show a positive result in the absence of any therapeutic effect. Furthermore, trials of low methodological quality (i.e. ones that have been inappropriately designed, conducted or reported) are prone to give misleading results. In a systematic review of the methodological quality of randomized trials in three branches of alternative medicine, Linde et al. highlighted major weaknesses in the homeopathy sector, including poor randomization. A separate 2001 systematic review that assessed the quality of clinical trials of homeopathy found that such trials were generally of lower quality than trials of conventional medicine.
+A related issue is publication bias: researchers are more likely to submit trials that report a positive finding for publication, and journals prefer to publish positive results. Publication bias has been particularly marked in alternative medicine journals, where few of the published articles (just 5% during the year 2000) tend to report null results. Regarding the way in which homeopathy is represented in the medical literature, a systematic review found signs of bias in the publications of clinical trials (towards negative representation in mainstream medical journals, and vice versa in alternative medicine journals), but not in reviews.
+Positive results are much more likely to be false if the prior probability of the claim under test is low.
+
+=== Systematic reviews and meta-analyses of efficacy ===
+Both meta-analyses, which statistically combine the results of several randomized controlled trials, and other systematic reviews of the literature are essential tools to summarize evidence of therapeutic efficacy. Early systematic reviews and meta-analyses of trials evaluating the efficacy of homeopathic preparations in comparison with placebo more often tended to generate positive results, but appeared unconvincing overall. In particular, reports of three large meta-analyses warned readers that firm conclusions could not be reached, largely due to methodological flaws in the primary studies and the difficulty in controlling for publication bias. The positive finding of one of the most prominent of the early meta-analyses, published in The Lancet in 1997 by Linde et al., was later reframed by the same research team, who wrote:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-2.md b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-2.md
new file mode 100644
index 000000000..f6620b776
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-2.md
@@ -0,0 +1,35 @@
+---
+title: "Evidence and efficacy of homeopathy"
+chunk: 3/6
+source: "https://en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:01.587272+00:00"
+instance: "kb-cron"
+---
+
+The evidence of bias [in the primary studies] weakens the findings of our original meta-analysis. Since we completed our literature search in 1995, a considerable number of new homeopathy trials have been published. The fact that a number of the new high-quality trials ... have negative results, and a recent update of our review for the most "original" subtype of homeopathy (classical or individualized homeopathy), seem to confirm the finding that more rigorous trials have less-promising results. It seems, therefore, likely that our meta-analysis at least overestimated the effects of homeopathic treatments.
+Subsequent work by John Ioannidis and others has shown that for treatments with no prior plausibility, the chances of a positive result being a false positive are much higher, and that any result not consistent with the null hypothesis should be assumed to be a false positive.
+A systematic review of the available systematic reviews confirmed in 2002 that higher-quality trials tended to have less positive results, and found no convincing evidence that any homeopathic preparation exerts clinical effects different from placebo.
+In 2005, The Lancet medical journal published a meta-analysis of 110 placebo-controlled homeopathy trials and 110 matched medical trials based upon the Swiss government's Programme for Evaluating Complementary Medicine, or PEK. The study concluded that its findings were "compatible with the notion that the clinical effects of homeopathy are placebo effects". This was accompanied by an editorial pronouncing "The end of homoeopathy".  A 2017 systematic review and meta-analysis found that the most reliable evidence did not support the effectiveness of non-individualized homeopathy. The authors noted that "the quality of the body of evidence is low."
+Other meta-analyses include homeopathic treatments to reduce cancer therapy side-effects following radiotherapy and chemotherapy, allergic rhinitis, attention-deficit hyperactivity disorder and childhood diarrhoea, adenoid vegetation, asthma, upper respiratory tract infection in children, insomnia, fibromyalgia, psychiatric conditions and Cochrane Library systematic reviews of homeopathic treatments for asthma, dementia, attention-deficit hyperactivity disorder, induction of labour, upper respiratory tract infections in children, and irritable bowel syndrome. Other reviews covered osteoarthritis, migraines, postoperative ecchymosis and edema, delayed-onset muscle soreness, preventing postpartum haemorrhage, or eczema and other dermatological conditions.
+Some clinical trials have tested individualized homeopathy, and there have been reviews of this, specifically. A 1998 review found 32 trials that met their inclusion criteria, 19 of which were placebo-controlled and provided enough data for meta-analysis. These 19 studies showed a pooled odds ratio of 1.17 to 2.23 in favour of individualized homeopathy over the placebo, but no difference was seen when the analysis was restricted to the methodologically best trials. The authors concluded that "the results of the available randomized trials suggest that individualized homeopathy has an effect over placebo. The evidence, however, is not convincing because of methodological shortcomings and inconsistencies." Jay Shelton, author of a book on homeopathy, has stated that the claim assumes without evidence that classical, individualized homeopathy works better than nonclassical variations. A 2014 systematic review and meta-analysis found that individualized homeopathic remedies may be slightly more effective than placebos, though the authors noted that their findings were based on low- or unclear-quality evidence. The same research team later reported that taking into account model validity did not significantly affect this conclusion.
+The results of reviews are generally negative or only weakly positive, and reviewers consistently report the poor quality of trials. The finding of Linde et al. that more rigorous studies produce less positive results is supported in several and contradicted by none.
+
+=== Statements by major medical organizations ===
+
+Health organizations such as the UK's National Health Service, the American Medical Association, the FASEB, and the National Health and Medical Research Council of Australia, have issued statements of their conclusion that there is "no good-quality evidence that homeopathy is effective as a treatment for any health condition". In 2009, World Health Organization official Mario Raviglione criticized the use of homeopathy to treat tuberculosis; similarly, another WHO spokesperson argued there was no evidence homeopathy would be an effective treatment for diarrhoea. They warned against the use of homeopathy for serious conditions such as depression, HIV and malaria.
+The American College of Medical Toxicology and the American Academy of Clinical Toxicology recommend that no one use homeopathic treatment for disease or as a preventive health measure. These organizations report that no evidence exists that homeopathic treatment is effective, but that there is evidence that using these treatments produces harm and can bring indirect health risks by delaying conventional treatment.
+
+== Explanations of perceived effects ==
+Science offers a variety of explanations for how homeopathy may appear to cure diseases or alleviate symptoms even though the preparations themselves are inert:
+
+The placebo effect – the intensive consultation process and expectations for the homeopathic preparations may cause the effect.
+Therapeutic effect of the consultation – the care, concern, and reassurance a patient experiences when opening up to a compassionate caregiver can have a positive effect on the patient's well-being.
+Unassisted natural healing – time and the body's ability to heal without assistance can eliminate many diseases of their own accord.
+Unrecognized treatments – an unrelated food, exercise, environmental agent, or treatment for a different ailment, may have occurred.
+Regression towards the mean – since many diseases or conditions are cyclical, symptoms vary over time and patients tend to seek care when discomfort is greatest; they may feel better anyway but because of the timing of the visit to the homeopath they attribute improvement to the preparation taken.
+Non-homeopathic treatment – patients may also receive standard medical care at the same time as homeopathic treatment, and the former is responsible for improvement.
+Cessation of unpleasant treatment – often homeopaths recommend patients stop getting medical treatment such as surgery or drugs, which can cause unpleasant side-effects; improvements are attributed to homeopathy when the actual cause is the cessation of the treatment causing side-effects in the first place, but the underlying disease remains untreated and still dangerous to the patient.
+
+== Purported effects in other biological systems ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-3.md b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-3.md
new file mode 100644
index 000000000..0bff7b03f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-3.md
@@ -0,0 +1,22 @@
+---
+title: "Evidence and efficacy of homeopathy"
+chunk: 4/6
+source: "https://en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:01.587272+00:00"
+instance: "kb-cron"
+---
+
+While some articles have suggested that homeopathic solutions of high dilution can have statistically significant effects on organic processes including the growth of grain, histamine release by leukocytes, and enzyme reactions, such evidence is disputed since attempts to replicate them have failed. A 2007 systematic review of high-dilution experiments found that none of the experiments with positive results could be reproduced by all investigators.
+In 1987, French immunologist Jacques Benveniste submitted a paper to the journal Nature while working at INSERM. The paper purported to have discovered that basophils, a type of white blood cell, released histamine when exposed to a homeopathic dilution of anti-immunoglobulin E antibody. The journal editors, sceptical of the results, requested that the study be replicated in a separate laboratory. Upon replication in four separate laboratories the study was published. Still sceptical of the findings, Nature assembled an independent investigative team to determine the accuracy of the research, consisting of Nature editor and physicist Sir John Maddox, American scientific fraud investigator and chemist Walter Stewart, and sceptic James Randi. After investigating the findings and methodology of the experiment, the team found that the experiments were "statistically ill-controlled", "interpretation has been clouded by the exclusion of measurements in conflict with the claim", and concluded, "We believe that experimental data have been uncritically assessed and their imperfections inadequately reported." James Randi stated that he doubted that there had been any conscious fraud, but that the researchers had allowed "wishful thinking" to influence their interpretation of the data.
+In 2001 and 2004, Madeleine Ennis published a number of studies that reported that homeopathic dilutions of histamine exerted an effect on the activity of basophils. In response to the first of these studies, Horizon aired a programme in which British scientists attempted to replicate Ennis' results; they were unable to do so.
+
+== Ethics and safety ==
+The provision of homeopathic preparations has been described as unethical. Michael Baum, Professor Emeritus of Surgery and visiting Professor of Medical Humanities at University College London (UCL), has described homoeopathy as a "cruel deception".
+Edzard Ernst, the first professor of complementary medicine in the United Kingdom and a former homeopathic practitioner, has expressed his concerns about pharmacists who violate their ethical code by failing to provide customers with "necessary and relevant information" about the true nature of the homeopathic products they advertise and sell:
+
+"My plea is simply for honesty. Let people buy what they want, but tell them the truth about what they are buying. These treatments are biologically implausible and the clinical tests have shown they don't do anything at all in human beings. The argument that this information is not relevant or important for customers is quite simply ridiculous."
+Patients who choose to use homeopathy rather than evidence-based medicine risk missing timely diagnosis and effective treatment of serious conditions such as cancer.
+In 2013 the UK Advertising Standards Authority concluded that the Society of Homeopaths were targeting vulnerable ill people and discouraging the use of essential medical treatment while making misleading claims of efficacy for homeopathic products.
+In 2015 the Federal Court of Australia imposed penalties on a homeopathic company, Homeopathy Plus! Pty Ltd and its director, for making false or misleading statements about the efficacy of the whooping cough vaccine and homeopathic remedies as an alternative to the whooping cough vaccine, in breach of the Australian Consumer Law.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-4.md b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-4.md
new file mode 100644
index 000000000..5942ae905
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-4.md
@@ -0,0 +1,24 @@
+---
+title: "Evidence and efficacy of homeopathy"
+chunk: 5/6
+source: "https://en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:01.587272+00:00"
+instance: "kb-cron"
+---
+
+=== Adverse effects ===
+Some homeopathic preparations involve poisons such as Belladonna, arsenic, and poison ivy, which are highly diluted in the homeopathic preparation. In rare cases, the original ingredients are present at detectable levels. This may be due to improper preparation or intentional low dilution. Serious adverse effects such as seizures and death have been reported or associated with some homeopathic preparations.
+On September 30, 2016, the FDA issued a safety alert to consumers warning against the use of homeopathic teething gels and tablets following reports of adverse events after their use. The agency recommended that parents discard these products and "seek advice from their health care professional for safe alternatives" to homeopathy for teething. The pharmacy CVS announced, also on September 30, that it was voluntarily withdrawing the products from sale and on October 11 Hyland's (the manufacturer) announced that it was discontinuing their teething medicine in the United States though the products remain on sale in Canada. On October 12, Buzzfeed reported that the regulator had "examined more than 400 reports of seizures, fever and vomiting, as well as 10 deaths" over a six-year period. The investigation (including analyses of the products) is still ongoing and the FDA does not know yet if the deaths and illnesses were caused by the products. However a previous FDA investigation in 2010, following adverse effects reported then, found that these same products were improperly diluted and contained "unsafe levels of belladonna, also known as deadly nightshade" and that the reports of serious adverse events in children using this product were "consistent with belladonna toxicity".
+Instances of arsenic poisoning have occurred after use of arsenic-containing homeopathic preparations. Zicam Cold remedy Nasal Gel, which contains 2X (1:100) zinc gluconate, reportedly caused a small percentage of users to lose their sense of smell; 340 cases were settled out of court in 2006 for US$12 million. In 2009, the FDA advised consumers to stop using three discontinued cold remedy Zicam products because it could cause permanent damage to users' sense of smell. Zicam was launched without a New Drug Application (NDA) under a provision in the FDA's Compliance Policy Guide called "Conditions under which homeopathic drugs may be marketed" (CPG 7132.15), but the FDA warned Matrixx Initiatives, its manufacturer, via a Warning Letter that this policy does not apply when there is a health risk to consumers.
+A 2000 review by homeopaths reported that homeopathic preparations are "unlikely to provoke severe adverse reactions". In 2012, a systematic review evaluating evidence of homeopathy's possible adverse effects concluded that "homeopathy has the potential to harm patients and consumers in both direct and indirect ways". One of the reviewers, Edzard Ernst, supplemented the article on his blog, writing: "I have said it often and I say it again: if used as an alternative to an effective cure, even the most 'harmless' treatment can become life-threatening." A 2016 systematic review and meta-analysis found that, in homeopathic clinical trials, adverse effects were reported among the patients who received homeopathy about as often as they were reported among patients who received placebo or conventional medicine.
+
+=== Lack of efficacy ===
+The lack of convincing scientific evidence supporting its efficacy and its use of preparations without active ingredients have led to characterizations as pseudoscience and quackery, or, in the words of a 1998 medical review, "placebo therapy at best and quackery at worst". The Russian Academy of Sciences considers homeopathy a "dangerous 'pseudoscience' that does not work", and "urges people to treat homeopathy 'on a par with magic'". The Chief Medical Officer for England, Dame Sally Davies, has stated that homeopathic preparations are "rubbish" and do not serve as anything more than placebos. Jack Killen, acting deputy director of the National Center for Complementary and Alternative Medicine, says homeopathy "goes beyond current understanding of chemistry and physics". He adds: "There is, to my knowledge, no condition for which homeopathy has been proven to be an effective treatment." Ben Goldacre says that homeopaths who misrepresent scientific evidence to a scientifically illiterate public, have "... walled themselves off from academic medicine, and critique has been all too often met with avoidance rather than argument". Homeopaths often prefer to ignore meta-analyses in favour of cherry picked positive results, such as by promoting a particular observational study (one which Goldacre describes as "little more than a customer-satisfaction survey") as if it were more informative than a series of randomized controlled trials.
+Referring specifically to homeopathy, the British House of Commons Science and Technology Committee has stated:
+
+In our view, the systematic reviews and meta-analyses conclusively demonstrate that homeopathic products perform no better than placebos. The Government shares our interpretation of the evidence.
+In the Committee's view, homeopathy is a placebo treatment and the Government should have a policy on prescribing placebos. The Government is reluctant to address the appropriateness and ethics of prescribing placebos to patients, which usually relies on some degree of patient deception. Prescribing of placebos is not consistent with an informed patient choice – which the Government claims is very important – as it means patients do not have all the information needed to make choice meaningful.
+Beyond ethical issues and the integrity of the doctor-patient relationship, prescribing pure placebos is bad medicine. Their effect is unreliable and unpredictable and cannot form the sole basis of any treatment on the NHS.
+The National Center for Complementary and Alternative Medicine of the United States' National Institutes of Health states:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-5.md b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-5.md
new file mode 100644
index 000000000..1c2bb5e47
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy-5.md
@@ -0,0 +1,27 @@
+---
+title: "Evidence and efficacy of homeopathy"
+chunk: 6/6
+source: "https://en.wikipedia.org/wiki/Evidence_and_efficacy_of_homeopathy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:01.587272+00:00"
+instance: "kb-cron"
+---
+
+Homeopathy is a controversial topic in complementary medicine research. A number of the key concepts of homeopathy are not consistent with fundamental concepts of chemistry and physics. For example, it is not possible to explain in scientific terms how a preparation containing little or no active ingredient can have any effect. This, in turn, creates major challenges to the rigorous clinical investigation of homeopathic preparations. For example, one cannot confirm that an extremely dilute preparation contains what is listed on the label, or develop objective measures that show effects of extremely dilute preparations in the human body.
+Ben Goldacre noted that in the early days of homeopathy, when medicine was dogmatic and frequently worse than doing nothing, homeopathy at least failed to make matters worse:
+
+During the 19th-century cholera epidemic, death rates at the London Homeopathic Hospital were three times lower than at the Middlesex Hospital. Homeopathic sugar pills won't do anything against cholera, of course, but the reason for homeopathy's success in this epidemic is even more interesting than the placebo effect: at the time, nobody could treat cholera. So, while hideous medical treatments such as blood-letting were actively harmful, the homeopaths' treatments at least did nothing either way.
+
+=== In lieu of standard medical treatment ===
+On clinical grounds, patients who choose to use homeopathy in preference to normal medicine risk missing timely diagnosis and effective treatment, thereby worsening the outcomes of serious conditions. Critics of homeopathy have cited individual cases of patients of homeopathy failing to receive proper treatment for diseases that could have been easily diagnosed and managed with conventional medicine and who have died as a result, and the "marketing practice" of criticizing and downplaying the effectiveness of mainstream medicine. Homeopaths claim that use of conventional medicines will "push the disease deeper" and cause more serious conditions, a process referred to as "suppression". Some homeopaths (particularly those who are non-physicians) advise their patients against immunization. Some homeopaths suggest that vaccines be replaced with homeopathic "nosodes", created from biological materials such as pus, diseased tissue, bacilli from sputum or (in the case of "bowel nosodes") faeces. While Hahnemann was opposed to such preparations, modern homeopaths often use them although there is no evidence to indicate they have any beneficial effects. Cases of homeopaths advising against the use of anti-malarial drugs have been identified. This puts visitors to the tropics who take this advice in severe danger, since homeopathic preparations are completely ineffective against the malaria parasite. Also, in one case in 2004, a homeopath instructed one of her patients to stop taking conventional medication for a heart condition, advising her on June 22, 2004, to "Stop ALL medications including homeopathic", advising her on or around August 20 that she no longer needed to take her heart medication, and adding on August 23, "She just cannot take ANY drugs – I have suggested some homeopathic remedies ... I feel confident that if she follows the advice she will regain her health." The patient was admitted to hospital the next day, and died eight days later, the final diagnosis being "acute heart failure due to treatment discontinuation".
+In 1978, Anthony Campbell, then a consultant physician at the Royal London Homeopathic Hospital, criticized statements by George Vithoulkas claiming that syphilis, when treated with antibiotics, would develop into secondary and tertiary syphilis with involvement of the central nervous system, saying that "The unfortunate layman might well be misled by Vithoulkas' rhetoric into refusing orthodox treatment".
+Vithoulkas' claims echo the idea that treating a disease with external medication used to treat the symptoms would only drive it deeper into the body and conflict with scientific studies, which indicate that penicillin treatment produces a complete cure of syphilis in more than 90% of cases.
+A 2006 review by W. Steven Pray of the College of Pharmacy at Southwestern Oklahoma State University recommends that pharmacy colleges include a required course in unproven medications and therapies, that ethical dilemmas inherent in recommending products lacking proven safety and efficacy data be discussed, and that students should be taught where unproven systems such as homeopathy depart from evidence-based medicine.
+In an article entitled "Should We Maintain an Open Mind about Homeopathy?" published in the American Journal of Medicine, Michael Baum and Edzard Ernst – writing to other physicians – wrote that "Homeopathy is among the worst examples of faith-based medicine... These axioms [of homeopathy] are not only out of line with scientific facts but also directly opposed to them. If homeopathy is correct, much of physics, chemistry, and pharmacology must be incorrect...".
+In 2013, Mark Walport, the UK Government Chief Scientific Adviser and head of the Government Office for Science, had this to say: "My view scientifically is absolutely clear: homoeopathy is nonsense, it is non-science. My advice to ministers is clear: that there is no science in homoeopathy. The most it can have is a placebo effect – it is then a political decision whether they spend money on it or not." His predecessor, John Beddington, referring to his views on homeopathy being "fundamentally ignored" by the Government, said: "The only one [view being ignored] I could think of was homoeopathy, which is mad. It has no underpinning of scientific basis. In fact, all the science points to the fact that it is not at all sensible. The clear evidence is saying this is wrong, but homoeopathy is still used on the NHS."
+
+== References ==
+
+== External links ==
+The evidence for homeopathy (by Robert Hahn) at Homeopath UK
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidence_gap_map-0.md b/data/en.wikipedia.org/wiki/Evidence_gap_map-0.md
index 36f33c158..e35bbbaa1 100644
--- a/data/en.wikipedia.org/wiki/Evidence_gap_map-0.md
+++ b/data/en.wikipedia.org/wiki/Evidence_gap_map-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Evidence_gap_map"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:35:09.354098+00:00"
+date_saved: "2026-05-05T09:56:02.798215+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Evidence_of_absence-0.md b/data/en.wikipedia.org/wiki/Evidence_of_absence-0.md
new file mode 100644
index 000000000..bc16e9ab0
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidence_of_absence-0.md
@@ -0,0 +1,45 @@
+---
+title: "Evidence of absence"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Evidence_of_absence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:36.622081+00:00"
+instance: "kb-cron"
+---
+
+Evidence of absence is evidence of any kind that suggests something is missing or that it does not exist. What counts as evidence of absence has been a subject of debate between scientists and philosophers. It is often distinguished from absence of evidence.
+
+
+== Overview ==
+
+Evidence of absence and absence of evidence are similar but distinct concepts. This distinction is captured in the aphorism "Absence of evidence is not evidence of absence."  This antimetabole is often attributed to Martin Rees or Carl Sagan, but a version appeared as early as 1888 in a writing by William Wright. In Sagan's words, the expression is a critique of the "impatience with ambiguity" exhibited by appeals to ignorance. Despite what the expression may seem to imply, a lack of evidence can be informative. For example, when testing a new drug, if no harmful effects are observed then this suggests that the drug is safe. This is because, if the drug were harmful, evidence of that fact can be expected to turn up during testing. The expectation of evidence makes its absence significant.
+As the previous example shows, the difference between evidence that something is absent (e.g., an observation that suggests there were no dragons here today) and simple absence of evidence (e.g., no careful research has been done) can be nuanced. Indeed, scientists will often debate whether an experiment's result should be considered evidence of absence, or if it remains absence of evidence. The debate regards whether the experiment would have detected the phenomenon of interest if it were there.
+The argument from ignorance for "absence of evidence" is not necessarily fallacious, for example, that a potentially life-saving new drug poses no long-term health risk unless proved otherwise. On the other hand, were such an argument to rely imprudently on the lack of research to promote its conclusion, it would be considered an informal fallacy whereas the former can be a persuasive way to shift the burden of proof in an argument or debate.
+
+
+== Science ==
+In carefully designed scientific experiments, null results can be interpreted as evidence of absence. Whether the scientific community will accept a null result as evidence of absence depends on many factors, including the detection power of the applied methods, the confidence of the inference, as well as confirmation bias within the community. For instance in amnesia studies, the absence of behavior indicative of memory is sometimes interpreted as the absence of the memory trace; however, certain researchers consider this interpretation flawed as the memory impairment may be temporary due to deficits in recall. Alternatively, the memory trace be latent and demonstrable via its indirect effects on new learning. Michael Davis, researcher at Emory University, argues that complete erasure can only be confidently inferred if all of the biological events that occurred when the memory was formed revert to their original status. Davis contends that because making these measurements in a complex organism is implausible, the concept of complete memory erasure (what he deems "strong form of forgetting") is not useful scientifically.
+
+
+== Law ==
+In many legal systems, a lack of evidence for a defendant's guilt is sufficient for acquittal. This is because of the presumption of innocence and the belief that it is worse to convict an innocent person than to let a guilty one go free. 
+On the other hand, the absence of evidence in the defendant's favor (e.g. an alibi) can make their guilt seem more likely. A jury can be persuaded to convict because of "evidentiary lacunae", or a lack of evidence they expect to hear.
+
+
+== Proving a negative ==
+
+A negative claim is a colloquialism for an affirmative claim that asserts the non-existence or exclusion of something. Proofs of negative claims are common in mathematics. Such claims include Euclid's theorem that there is no largest prime number, and Arrow's impossibility theorem. There can be multiple claims within a debate, nevertheless, whoever makes a claim usually carries the burden of proof regardless of positive or negative content in the claim.
+A negative claim may or may not exist as a counterpoint to a previous claim. A proof of impossibility or an evidence of absence argument are typical methods to fulfill the burden of proof for a negative claim.
+Philosopher Steven Hales argues that typically one can logically be as confident with the negation of an affirmation. Hales says that if one's standards of certainty leads them to say "there is never 'proof' of non-existence", then they must also say that "there is never 'proof' of existence either". Hales argues that there are many cases where we may be able to prove something does not exist with as much certainty as proving something does exist. A similar position is taken by philosopher Stephen Law who highlights that rather than focusing on the existence of "proof", a better question would be whether there is any reasonable doubt for existence or non-existence.
+
+
+== See also ==
+Argument from ignorance
+Argument from silence
+Contraposition
+Probatio diabolica
+Proof by exhaustion
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidentialism-0.md b/data/en.wikipedia.org/wiki/Evidentialism-0.md
new file mode 100644
index 000000000..68442c798
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidentialism-0.md
@@ -0,0 +1,26 @@
+---
+title: "Evidentialism"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Evidentialism"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:05.345615+00:00"
+instance: "kb-cron"
+---
+
+Evidentialism is a thesis in epistemology which states that one is justified to believe something if and only if that person has evidence which supports said belief. Evidentialism is, therefore, a thesis about which beliefs are justified and which are not. 
+For philosophers Richard Feldman and Earl Conee, evidentialism is the strongest argument for justification because it identifies the primary notion of epistemic justification. They argue that if a person's attitude towards a proposition fits their evidence, then their doxastic attitude for that proposition is epistemically justified. Feldman and Conee offer the following argument for evidentialism as an epistemic justification:
+(EJ) Doxastic attitude D toward proposition p is epistemically justified for S at t if and only if having D toward p fits the evidence.
+For Feldman and Conee one's doxastic attitude is justified if it fits one's evidence. EJ is meant to show the idea that justification is characteristically epistemic. This idea makes justification dependent on evidence.
+Feldman and Conee believe that because objections to EJ have become so prominent their defense for it is appropriate. The theses that object EJ are implying that epistemic justification is dependent upon the "cognitive capacities of an individual or upon the cognitive processes or information-gatherings practices that lead to an attitude." For Feldman and Conee, EJ is in contrast to these theses; EJ contends that the epistemic justification for an attitude is only dependent upon evidence. 
+
+== Criticism ==
+Critics of evidentialism sometimes reject the claim that a conclusion is justified only if one's evidence supports that conclusion. A typical counterexample goes like this. Suppose, for example, that Babe Ruth approaches the batter's box believing that he will hit a home run despite his current drunkenness and overall decline in performance in recent games. He realizes that, however unlikely it is that his luck will change, it would increase his chances of hitting a home run if he maintains a confident attitude. In these circumstances, critics of evidentialism argue that his belief that p = Babe Ruth will hit a home run is justified, even though his evidence does not support this belief.
+Evidentialists may respond to this criticism by forming a distinction between pragmatic or prudential justification and epistemic justification. In Babe Ruth's case, it is pragmatically justified that he believe p, but it is nevertheless epistemically unjustified: though the belief may be justified for the purpose of promoting some other goal (a successful at bat, in Ruth's case), it is not justified relative to the purely epistemic goal of having beliefs that are most likely to be true.
+A similar response follows the criticism that evidentialism implies all faith-based beliefs are unjustified. For example, fideism claims that evidence is irrelevant to religious beliefs and that attempts to justify religious beliefs in such a way are misguided. Superficially, fideism and evidentialism have mutually exclusive takes on religious beliefs, but evidentialists use the term "justification" in a much weaker sense than the one in which fideists most likely use it. Evidentialism merely defines the epistemic condition of a belief.
+Although evidentialism states that the content of the evidence does not matter, only that it constitutes valid justification towards some proposition, a skeptical criticism may be levelled at evidentialism from uncertainty theories. One's evidence may be objectively disproved at some point or it may be the case that one can never have absolute certainty of one's evidence. Given the logic of arguments concerning principles of uncertainty and randomness, skepticism towards knowledge merely becomes skepticism towards valid justification.
+Likewise, some say that the human mind is not naturally inclined to form beliefs based on evidence, viz. cognitive dissonance. While this may be the case, evidentialists admit, evidentialism is only meant to separate justified beliefs from unjustified beliefs. One can believe that evidentialism is true yet still maintain that the human mind is not naturally inclined to form beliefs based on evidence. He would simply have to conclude that the mind is not naturally inclined to form justified beliefs.
+
+== Infinite regress argument ==
+Evidentialism also faces a challenge from the infinite regress argument. This argument begins with the observation that, normally, one's supporting evidence for a belief consists of other beliefs. However, it seems that these other beliefs can do the job of justifying only if they themselves are already justified. And evidentialism demands that these supporting beliefs be justified by still further evidence if they are to be justified themselves. But this same reasoning would apply to the new, deeper level of supporting beliefs: they can only justify if they're themselves justified, and evidentialism therefore demands an even deeper level of supporting belief. According to this argument, a justified belief requires an endless supply of reasons. Some philosophers such as Thomas Nagel posit that this is an absurd conclusion.
+In general, responses to this argument can be classified in the following ways:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidentialism-1.md b/data/en.wikipedia.org/wiki/Evidentialism-1.md
new file mode 100644
index 000000000..1a5498bfb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidentialism-1.md
@@ -0,0 +1,35 @@
+---
+title: "Evidentialism"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Evidentialism"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:05.345615+00:00"
+instance: "kb-cron"
+---
+
+Foundationalism: There exist beliefs that are justified, but not because they are based on any other beliefs. These are called properly basic beliefs, and they are the foundation upon which all other justified beliefs ultimately rest.
+Coherentism: Justified beliefs are all evidentially supported by other beliefs, but an infinite set of beliefs is not generated, because the chains of evidential support among beliefs is allowed to move in a circle. On the resulting picture, a person's belief is justified when it fits together with the person's other beliefs in a coherent way in which the person's various beliefs mutually support one another.
+A  modest reasoner  subset of Coherentism would insist that all justifiable beliefs be statements about "some objects" since the negation/complement of a some statement is another some statement.
+Skepticism: There cannot be any justified beliefs.
+A  modest reasoner  subset of Scepticism like the subset of Coherentism would likewise insist and define all justifiable beliefs be statements about "some objects" since the negation/complement of a some statement is another some statement.
+Infinitism: Aside from these responses, some philosophers have said that evidential chains terminate in beliefs that are not justified. Others have said that, indeed, there can exist infinite chains of reasons.
+Of the main responses, coherentism and skepticism are clearly consistent with evidentialism. Coherentism allows evidential support for all of our justified beliefs in the face of the regress argument by allowing for circular chains of evidential support among beliefs. And the skeptic here is utilizing an evidentialist demand to arrive at her skeptical conclusion.
+But because the resulting skepticism is so sweeping and devastating, and because so many reject the legitimacy of the circular reasoning embraced by the coherentist, foundationalism is the favored response of many philosophers to the regress argument. And foundationalism does not so clearly fit together with evidentialism. At first glance, at least, the "basic" beliefs of the foundationalist would appear to be counterexamples to the evidentialist's thesis, in that they are justified beliefs that are not rational because they are not supported by deeper evidence.
+
+== Non-evidentialist theories of knowledge and justification ==
+Many contemporary epistemologists reject the view that evidential support is the whole story about the justification of beliefs. While no sensible epistemologists generally urge people to disregard their evidence when forming beliefs, many believe that a more complete theory would introduce considerations about the processes that initiate and sustain beliefs. An example of one such theory is reliabilism. The most influential proponent of reliabilism is Alvin Goldman. According to a crude form of reliabilism, S is justified in believing p if and only if S's belief in p is caused by a reliable process—a process that generally leads to true beliefs. Some of these reliable processes may require the processing of evidence; many others won't. So, Goldman would argue, evidentialism, on which the justification of a belief always turns completely on the issue of the belief's evidential support, is false. Likewise, evidentialism will be rejected by more sophisticated versions of reliabilism, some of which will allow evidence an important but limited role, as opposed to the all-encompassing role assigned to it by evidentialism.
+Other non-evidentialist theories include: the causal theory, according to which S knows p if and only if S's belief in p is causally connected in an appropriate way with S's believing p; and Robert Nozick's truth tracking theory, according to which S knows p if and only if (i) p is true, (ii) S believes p, (iii) S's attitude toward p tracks the truth value of p in that, when p is not true, S does not believe p and when p is true, S does believe p.
+Another alternative perspective, promoted by David Hume's 18th-century opponent, Presbyterian philosopher Thomas Reid, and perhaps hinted at by Hume himself, at least in some moods (though this is a very controversial issue in interpreting Hume), has it that some of our "natural" beliefs—beliefs we are led to form by natural features of the human constitution—have what can be called an "innocent-until-proven-guilty" status. Contrary to evidentialism, they can be justified in the absence of any effective evidence that supports them. They are justified just so long as one doesn't have good reason to think them false.
+A new account of the extent of our evidence is Timothy Williamson's claim that E=K: one's evidence is what one knows. Going by the "letter of the law," Williamson's resulting theory is not contrary to, but is rather an instance of, evidentialism. By allowing our evidence to encompass everything we know, Williamson is able to give thoroughly evidentialist accounts of many important epistemological concepts. But, traditionally, evidentialists have presupposed much more restrictive accounts of what our evidence is. Thus, Williamson's theory is opposed to the spirit of much traditional evidentialism, primarily because it turns evidentialism from an internalist account of justification to an externalist account (due to the factive nature of knowledge.) However, Williamson's work may point to a quite general way to modify traditional evidentialism to make it better able to meet the challenges it faces: whether or not one goes so far as to accept that E=K, broadening one's view of what constitutes our evidence may provide a way to address many of the objections to evidentialism, especially to those disinclined to swallow skeptical consequences of a view
+
+== Notes ==
+
+== References ==
+Conee; Feldman (2004), Evidentialism, Oxford University Press.
+
+== External links ==
+Fieser, James; Dowden, Bradley (eds.). "Evidentialism". Internet Encyclopedia of Philosophy. ISSN 2161-0002. OCLC 37741658. by Dan Mittag of the University of Rochester
+Kelly, Thomas. "Evidence". In Zalta, Edward N. (ed.). Stanford Encyclopedia of Philosophy. ISSN 1095-5054. OCLC 429049174.
+Evidentialism at the Indiana Philosophy Ontology Project
+Evidentialism at PhilPapers
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidentiality-0.md b/data/en.wikipedia.org/wiki/Evidentiality-0.md
new file mode 100644
index 000000000..2ba87fbe6
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidentiality-0.md
@@ -0,0 +1,39 @@
+---
+title: "Evidentiality"
+chunk: 1/4
+source: "https://en.wikipedia.org/wiki/Evidentiality"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:06.594635+00:00"
+instance: "kb-cron"
+---
+
+In linguistics, evidentiality is, broadly, the indication of the nature of evidence for a given statement; that is, whether evidence exists for the statement and if so, what kind. An evidential (also verificational or validational) is the particular grammatical element (affix, clitic, or particle) that indicates evidentiality. Languages with only a single evidential have had terms such as mediative, médiatif, médiaphorique, and indirective used instead of evidential.
+Evidentiality may be direct or indirect: direct evidentials are used to describe information directly perceived by the speaker through vision as well as other sensory experiences while indirect evidentials consist of the other grammatical markers for evidence such as quotatives and inferentials.
+
+== Introduction ==
+All languages have some means of specifying the source of information. European languages (such as Germanic and Romance languages) often use modal verbs (Spanish: deber de, Dutch: zouden, Danish: skulle, German: sollen) or other lexical words (adverbials, English: reportedly) or phrases (English: it seems to me).
+Some languages have a distinct grammatical category of evidentiality that is required to be expressed at all times. In contrast, the elements in European languages indicating the information source are optional and usually do not indicate evidentiality as their primary function; thus, they do not form a grammatical category. The obligatory elements of grammatical evidentiality systems may be translated into English, variously, as I hear that, I see that, I think that, as I hear, as I can see, as far as I understand, they say, it is said, it seems, it seems to me that, it looks like, it appears that, it turns out that, alleged, stated, allegedly, reportedly, obviously, etc.
+Alexandra Aikhenvald (2004) reports that about a quarter of the world's languages have some type of grammatical evidentiality. Laura Mazzoni has since conducted a preliminary study on evidentiality in Italian Sign Language (LIS).
+Grammatical evidentiality may be expressed in different forms depending on the language, such as through affixes, clitics, or particles. For example, Japanese has inferential evidentials and reportive markers that are realized as suffixes on a variety of mainly verbal predicates, and as grammaticalized nouns. As another example, Eastern Pomo uses four evidential suffixes that are added to verbs: -ink’e (nonvisual sensory), -ine (inferential), -·le (hearsay), and -ya (direct knowledge).
+
+Many languages with grammatical evidentiality mark evidentiality independently from tense-aspect or epistemic modality, which is the speaker's evaluation of the information, i.e. whether it is reliable, uncertain, probable.
+The use of evidentiality has pragmatic implications. In languages that do not mark evidentiality distinctly from epistemic modality, for example, a person who makes a false statement qualified as a belief may be considered mistaken, while a person who makes a false statement qualified as a personally observed fact will probably be considered to have lied. More generally, a speaker of a language that does have obligatory grammatical evidentiality is required to cognitively engage with the source of their belief of any statement in a manner that the speaker of languages without obligatory evidentiality may gloss over.
+In some languages, evidential markers also serve other purposes, such as indicating the speaker's attitude towards, or belief in, the statement. Usually a direct evidential marker may serve to indicate that the speaker is certain about the event stated. Using an indirect evidential marker, such as one for hearsay or reported information, may indicate that the speaker is uncertain about the statement, or doesn't want to take responsibility for its truth. A "hearsay" evidential may then have the undertone of "that's what they say; whether or not it's true is nothing I can take responsibility for".  In other languages, this is not the case. Therefore, one should distinguish between such evidential markers that only mark source of knowledge, and such evidential markers that serve other functions, such as marking epistemic modality.
+Evidentials can also be used to "deflect culpability" in a statement. In his dissertation on Nanti, a Peruvian Amazonian language, Lev Michael refers to an example in which a young girl is accidentally burned, and a community member questions her mother about how it happened. Her mother uses the evidential marker ka which translates to "presumably," to deflect responsibility for the girl's mistake.
+Some languages are borderline cases. For example, the Romance languages are mostly like English in not having grammatical evidentiality, but do have a conditional mood which has three uses: conditions, future-in-the-past, and hearsay. Thus in journalistic French, there is frequently a distinction between Il a reconnu sa culpabilité and Il aurait reconnu sa culpabilité: both translate to "He has admitted his guilt," but with an implication of certainty with the first, and the idea of "reportedly" with the second. The same happens in Spanish (Él ha reconocido su culpa vs. Él habría reconocido su culpa) and in Portuguese (Ele reconheceu sua culpa vs. Ele teria reconhecido sua culpa).
+Aikhenvald identified five semantic categories that recurrently occur across languages of the world:
+
+Visual Sensory
+Non-Visual Sensory
+Inferentials
+Hearsay Reportatives
+Quotative Reportatives
+No language has been reported to have special forms for smell, taste or feeling although these may be covered by non-visual evidentials.
+
+== Types according to Aikhenvald ==
+Following the typology of Alexandra Aikhenvald, there are two broad types of evidential marking:
+
+indirectivity marking ("type I")
+evidential marking ("type II")
+The first type (indirectivity) indicates whether evidence exists for a given statement, but does not specify what kind of evidence. The second type (evidentiality proper) specifies the kind of evidence (such as whether the evidence is visual, reported, or inferred).
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidentiality-1.md b/data/en.wikipedia.org/wiki/Evidentiality-1.md
new file mode 100644
index 000000000..7cf2d4f89
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidentiality-1.md
@@ -0,0 +1,74 @@
+---
+title: "Evidentiality"
+chunk: 2/4
+source: "https://en.wikipedia.org/wiki/Evidentiality"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:06.594635+00:00"
+instance: "kb-cron"
+---
+
+=== Indirectivity (type I) ===
+Indirectivity (also known as inferentiality) systems are common in Uralic and Turkic languages. These languages indicate whether evidence exists for a given source of information; thus, they contrast direct information (reported directly) and indirect information (reported indirectly, focusing on its reception by the speaker/recipient). Unlike the other evidential "type II" systems, an indirectivity marking does not indicate information about the source of knowledge: it is irrelevant whether the information results from hearsay, inference, or perception; however, some Turkic languages distinguish between reported indirect and non-reported indirect, see Johanson 2003, 2000 for further elaboration. This can be seen in the following Turkish verbs:
+
+In the word geldi, the unmarked suffix -di indicates past tense. In the second word gelmiş, the suffix -miş also indicates past tense but indirectly. It may be translated into English with the added phrases 'obviously', 'apparently' or 'as far as I understand'. The direct past tense marker -di is unmarked (or neutral) in the sense that whether or not evidence exists supporting the statement is not specified.
+
+=== Evidentiality (type II) ===
+The other broad type of evidentiality systems ("type II") specifies the nature of the evidence supporting a statement. These kinds of evidence can be divided into such categories as:
+
+Sensory
+Visual
+Non-visual
+Inferential
+Assumed
+Reportative
+Hearsay
+Quotative
+Sensory evidentials can often be divided into different types. Some languages mark visual evidence differently from nonvisual evidence that is heard, smelled, or felt. The Kashaya language has a separate auditory evidential.
+An inferential evidential indicates information was not personally experienced but was inferred from indirect evidence. Some languages have different types of inferential evidentials. Some of the inferentials found indicate:
+
+Information inferred by direct physical evidence
+Information inferred by general knowledge
+Information inferred/assumed because of speaker's experience with similar situations
+Past deferred realization
+In many cases, different inferential evidentials also indicate epistemic modality, such as uncertainty or probability (see epistemic modality below). For example, one evidential may indicate that the information is inferred but of uncertain validity, while another indicates that the information is inferred but unlikely to be true.
+Reportative evidentials indicate that the information was reported to the speaker by another person. A few languages distinguish between hearsay evidentials and quotative evidentials. Hearsay indicates reported information that may or may not be accurate. A quotative indicates the information is accurate and not open to interpretation, i.e., is a direct quotation. An example of a reportative from Shipibo (-ronki):
+
+==== Typology of evidentiality systems ====
+The following is a brief survey of evidential systems found in the languages of the world as identified in Aikhenvald (2004). Some languages only have two evidential markers while others may have six or more. The system types are organized by the number of evidentials found in the language. For example, a two-term system (A) will have two different evidential markers; a three-term system (B) will have three different evidentials. The systems are further divided by the type of evidentiality that is indicated (e.g. A1, A2, A3, etc.). Languages that exemplify each type are listed in parentheses.
+The most common system found is the A3 type.
+Two-term systems:
+
+A1. witness, nonwitness (e.g. Jarawara, Yukaghir languages, Mỹky, Godoberi, Kalasha-mun, Khowar, Yanam)
+A2. nonfirsthand, everything else (e.g. Abkhaz, Mansi, Khanty, Nenets, Enets, Selkup, Northeast Caucasian languages)
+A3. reported, everything else (e.g. Turkic languages, Tamil, Enga, Tauya, Lezgian, Kham, Estonian, Livonian, Tibeto-Burman languages, several South American languages)
+Three-term systems:
+
+B1. visual sensory, inferential, reportative (e.g. Aymara, Shastan languages, Qiang languages, Maidu, most Quechuan languages, Northern Embera languages)
+B2. visual sensory, nonvisual sensory, inferential (e.g. Washo)
+B3. nonvisual sensory, inferential, reportative (e.g. Retuarã, Northern Pomo)
+B4. witness (direct), nonwitness (indirect), inferential, reportative  (e.g. Tsezic and Dagestanian languages)
+Four-term systems:
+
+C1. visual sensory, nonvisual sensory, inferential, reportative  (e.g. Tariana, Xamatauteri, Eastern Pomo, East Tucanoan languages)
+C2. visual sensory, inferential #1, inferential #2, reportative (e.g. Tsafiki, Pawnee, Ancash Quechua)
+C3. nonvisual sensory, inferential #1, inferential #2, reportative (e.g. Wintu)
+C4. visual sensory, inferential, reportative #1, reportative #2 (e.g. Southeastern Tepehuan)
+C5. witness (non-subjective, non-renarrative), inferential (subjective, non-renarrative), renarrative (non-subjective, renarrative), dubitative (subjective, renarrative) (e.g. Bulgarian)
+Five-plus term systems:
+
+visual sensory, nonvisual sensory, inferential, reportative, assumed (e.g. Tuyuca, Tucano)
+witness, inferential, reportative, assumed, "internal support" (e.g. Nambikwaran languages)
+visual sensory, nonvisual sensory, inferential, reported, heard from known source, direct participation (e.g. Fasu)
+nonvisual sensory, inferential #1, inferential #2, inferential #3, reportative (e.g. Western Apache)
+inferential, anticipation, performative, deduction, induction, hearsay, direct observation, opinion, assumed, "to know by culture", "to know by internal" (Lojban)
+
+== Evidentiality marking and other categories ==
+Evidential systems in many languages are often marked simultaneously with other linguistic categories. For example, according to Aikhenvald, a given language may use the same element to mark both evidentiality and mirativity, i.e., unexpected information. She claims that this is the case of Western Apache where the post-verbal particle lą̄ą̄ primarily functions as a mirative but also has a secondary function as an inferential evidential. This phenomenon of evidentials developing secondary functions, or other grammatical elements such as miratives and modal verbs developing evidential functions is fairly widespread. The following types of mixed systems have been reported:
+
+evidentiality with mirativity
+evidentiality with tense-aspect
+evidentiality with modality (this is discussed in the next section below)
+In addition to the interactions with tense, modality, and mirativity, the usage of evidentials in some languages may also depend on the clause type, discourse structure, and/or linguistic genre.
+However, despite the intersection of evidentiality systems with other semantic or pragmatic systems (through grammatical categories), Aikhenvald believes that several languages do mark evidentiality without any grammatical connection to these other semantic/pragmatic systems. More explicitly stated, she believes that there are modal systems which do not express evidentiality, and evidential systems which do not express modality. Likewise, there are mirative systems which do not express evidentiality, and evidential systems which do not express mirativity.
+Aside from those, egophoricity may interact with evidentiality as well.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidentiality-2.md b/data/en.wikipedia.org/wiki/Evidentiality-2.md
new file mode 100644
index 000000000..7c3f4999e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidentiality-2.md
@@ -0,0 +1,64 @@
+---
+title: "Evidentiality"
+chunk: 3/4
+source: "https://en.wikipedia.org/wiki/Evidentiality"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:06.594635+00:00"
+instance: "kb-cron"
+---
+
+=== Tense ===
+Some languages may only distinguish between direct and indirect evidentials in the past tense. This is the case for Georgian (Kartvelian), Turkish (Turkic), Komi-Zyrian (Finno-Ugric), Haida (a language isolate in British Columbia and Alaska), and Ika (Chibchan).
+
+=== Epistemic modality ===
+Evidentiality is often considered to be a sub-type of epistemic modality (see, for example, Palmer 1986, Kiefer 1994). Other linguists consider evidentiality (marking the source of information in a statement) to be distinct from epistemic modality (marking the degree of confidence in a statement). An English example:
+
+I see that he is coming. (evidential)
+I know that he is coming. (epistemic)
+For instance, de Haan states that evidentiality asserts evidence while epistemic modality evaluates evidence and that evidentiality is more akin to a deictic category marking the relationship between speakers and events/actions (like the way demonstratives mark the relationship between speakers and objects; see also Joseph 2003). Aikhenvald (2003) finds that evidentials may indicate a speaker's attitude about the validity of a statement but this is not a required feature of evidentials. Additionally, she finds that evidential-marking may co-occur with epistemic-marking, but it may also co-occur with aspectual/tense or mirative marking.
+Considering evidentiality as a type of epistemic modality may only be the result of analyzing non-European languages in terms of the systems of modality found in European languages. For example, the modal verbs in Germanic languages are used to indicate both evidentiality and epistemic modality (and are thus ambiguous when taken out of context). Other (non-European) languages clearly mark these differently. De Haan (2001) finds that the use of modal verbs to indicate evidentiality is comparatively rare (based on a sample of 200 languages).
+
+=== Clause type ===
+Evidential categories are more likely to be marked in a main declarative clause than in the other types of clauses. In some languages, however, evidential forms may appear in questions or commands as well.
+
+=== Terminology ===
+Although some linguists have proposed that evidentiality should be considered separately from epistemic modality, other linguists conflate the two. Because of this conflation, some researchers use the term evidentiality to refer both to the marking of the knowledge source and the commitment to the truth of the knowledge.
+
+== In English (not grammaticalized) ==
+Evidentiality is not considered a grammatical category in English because it is expressed in diverse ways and is always optional.  In contrast, many other languages (including Quechua, Aymara, and Yukaghir) require the speaker to mark the main verb or the sentence as a whole for evidentiality, or offer an optional set of affixes for indirect evidentiality, with direct experience being the default assumed mode of evidentiality.
+Consider these English sentences:
+
+I am hungry.
+Bob is hungry.
+We are unlikely to say the second unless someone (perhaps Bob himself) has told us that Bob is hungry (We might still say it for someone incapable of speaking for themself, such as a baby or a pet). If we are simply assuming that Bob is hungry based on the way he looks or acts, we are more likely to say something like:
+
+Bob looks hungry.
+Bob seems hungry.
+Bob would be hungry by now.
+Bob must be hungry by now.
+Here, the fact that we are relying on sensory evidence, rather than direct experience, is conveyed by our use of the word look or seem.
+Another situation in which the evidential modality is expressed in English is in certain kinds of predictions, namely those based on the evidence at hand. These can be referred to as "predictions with evidence". Examples:
+
+Look at those clouds! It's going to rain! (Compare "It will rain!").
+
+=== Possible exceptions ===
+The suffix "-ish" can be considered to be a grammaticalized marker of uncertainty.
+
+== Western history of the concept ==
+The notion of evidentiality as obligatory grammatical information was first made apparent in 1911 by Franz Boas in his introduction to The Handbook of American Indian Languages in a discussion of Kwakiutl and in his grammatical sketch of Tsimshianic. The term evidential was first used in the current linguistic sense by Roman Jakobson in 1957 in reference to Balkan Slavic (Jacobsen 1986:4; Jakobson 1990) with the following definition:
+
+"EnEns/Es evidential is a tentative label for the verbal category which takes into account three events — a narrated event (En), a speech event (Es), and a narrated speech event (Ens). The speaker reports an event on the basis of someone else's report (quotative, i.e. hearsay evidence), of a dream (revelative evidence), of a guess (presumptive evidence) or of his own previous experience (memory evidence)."
+Jakobson also was the first to clearly separate evidentiality from grammatical mood. By the middle of the 1960s, evidential and evidentiality were established terms in linguistic literature.
+Systems of evidentiality have received focused linguistic attention only relatively recently. The first major work to examine evidentiality cross-linguistically is Chafe & Nichols (1986). A more recent typological comparison is Aikhenvald (2004).
+
+== See also ==
+Epistemology – Philosophical study of knowledge
+Linguistic modality – Phenomenon whereby language is used to discuss possible situationsPages displaying short descriptions of redirect targets
+Epistemic modality – Type of linguistic modality
+Mirativity – Grammatical category which conveys surprise
+Egophoricity – Linguistic encoding of personal knowledge
+Grammatical mood – Grammatical feature of verbs
+Evidence theory – Mathematical framework to model epistemic uncertaintyPages displaying short descriptions of redirect targets
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Evidentiality-3.md b/data/en.wikipedia.org/wiki/Evidentiality-3.md
new file mode 100644
index 000000000..5f80e2a1e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Evidentiality-3.md
@@ -0,0 +1,47 @@
+---
+title: "Evidentiality"
+chunk: 4/4
+source: "https://en.wikipedia.org/wiki/Evidentiality"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:06.594635+00:00"
+instance: "kb-cron"
+---
+
+== Further reading ==
+Aikhenvald, Alexandra Y.; & Dixon, R. M. W. (1998). Evidentials and areal typology: A case-study from Amazonia. Language Sciences, 20, 241–257.
+Aikhenvald, Alexandra Y.; & Dixon, R. M. W. (Eds.). (2003). Studies in evidentiality. Typological studies in language (Vol. 54). Amsterdam: John Benjamins Publishing Company. ISBN 90-272-2962-7; ISBN 1-58811-344-2.
+Aikhenvald, Alexandra Y.; & Dixon, R. M. W. (Eds.). (2014) The Grammar of Knowledge: A Cross-Linguistic Typology. Oxford University Press. ISBN 978-0-19-870131-6
+Blakemore, D. (1994). Evidence and modality. In R. E. Asher (Ed.), The Encyclopedia of language and linguistics (pp. 1183–1186). Oxford: Pergamon Press. ISBN 0-08-035943-4.
+Chafe, Wallace L.; & Nichols, Johanna. (Eds.). (1986). Evidentiality: The linguistic encoding of epistemology. Norwood, NJ: Ablex.
+Comrie, Bernard. (2000). Evidentials: Semantics and history. In L. Johanson & B. Utas (Eds.).
+de Haan, Ferdinand (2013b), "Coding of Evidentiality", in Dryer, Matthew S.; Haspelmath, Martin (eds.), WALS Online (v2020.3), retrieved February 3, 2024
+Faust, Norma. (1973). Lecciones para el aprendizaje del idioma shipibo-conibo [Lessons for learning the Shipibo-Conibo language]. Lima: Summer Institute of Linguistics.
+Guentchéva, Zlatka. (1996a). Introduction. In Z. Guentchéva (Ed.) (pp. 11–18).
+Guentchéva, Zlatka (Ed.). (1996b). L’Énonciation médiatisée. Bibliothèque de l’information grammaticale. Louvain: Éditions Peeters. ISBN 90-6831-861-6; ISBN 2-87723-244-1.
+Johanson, Lars. (2000). Turkic indirectives. In L. Johanson & B. Utas (Eds.) (pp. 61–87).
+Jacobsen, W. H. Jr. (1986). The heterogeneity of evidentials in Makah. In W. L. Chafe & J. Nichols (Eds.) (pp. 3–28).
+Jakobson, Roman. (1990). Shifters and verbal categories. In On language (pp. 386–392). Cambridge, MA: Harvard University Press. (Original work published 1957).
+Johanson, Lars. (2003). Evidentiality in Turkic. In A. Y. Aikhenvald & R. M. W. Dixon (Eds.) (pp. 273–290).
+Johanson, Lars; & Utas, Bo (Eds.). (2000). Evidentials: Turkic, Iranian and neighboring languages. Berlin: Mouton de Gruyter. ISBN 3-11-016158-3.
+Joseph, Brian D. (2003). Evidentials: Summation, questions, prospects. In A. Y. Aikhenvald & R. M. W. Dixon (Eds.) (pp. 307–327).
+Kiefer, Ferenc. (1994). Modality. In R. E. Asher (Ed.), The Encyclopedia of language and linguistics (pp. 2515–2520). Oxford: Pergamon Press.
+LaPolla, Randy J. (2003). Evidentiality in Qiang. In A. Y. Aikhenvald & R. M. W. Dixon (Eds.) (pp. 63–78).
+Maslova, Elena. (2003). Evidentiality in Yukaghir. In A. Y. Aikhenvald & R. M. W. Dixon (Eds.) (pp. 237–241).
+Noël, Dirk. (2001). The passive matrices of English infinitival complement clauses: Evidentials on the road to auxiliarihood? Studies in Language, 25, 255–296.
+Palmer, F. R. (1986). Mood and modality. Cambridge: Cambridge University Press. ISBN 0-521-26516-9, ISBN 0-521-31930-7. (2nd ed. published 2001).
+Palmer, F. R. (1994). Mood and modality. In R. E. Asher (Ed.), The Encyclopedia of language and linguistics (pp. 2535–2540). Oxford: Pergamon Press.
+Slobin, Dan Isaac; Aksu, Ayhan A. (1982). "Tense, Aspect and Modality in the Use of the Turkish Evidential" (PDF). In Hopper, Paul J. (ed.). Tense-Aspect: Between semantics & pragmatics. Typological Studies in Language. Vol. 1. John Benjamins. p. 185. doi:10.1075/tsl.1.13slo. ISBN 978-90-272-2865-9. Archived from the original on April 2, 2024.
+Speas, Peggy. (2010) 'Evidentials as Generalized Functional Heads.' in A.M. diScuillo, ed. Interface Legibility at the Edge. Oxford University Press.
+Willet, Thomas L. (1988). A cross-linguistic survey of the grammaticalization of evidentiality. Studies in Language, 12, 51–97.
+
+== External links ==
+
+Language & Power (Evidentiality)
+Ferdinand de Haan's research on evidentiality
+Evidentiality bibliography
+world map of the language distribution of evidentiality
+Semantics: Modality and Evidentiality
+Evidentiality in Dena’ina Athabascan
+review of Aikhenvald & Dixon (2003) Deprecated link archived 2013-01-12 at archive.today (Linguist List)
+review of Aikhenvald (2004) Deprecated link archived 2013-01-13 at archive.today (Linguist List)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Exculpatory_evidence-0.md b/data/en.wikipedia.org/wiki/Exculpatory_evidence-0.md
new file mode 100644
index 000000000..ee6e4080a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Exculpatory_evidence-0.md
@@ -0,0 +1,26 @@
+---
+title: "Exculpatory evidence"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Exculpatory_evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:07.767056+00:00"
+instance: "kb-cron"
+---
+
+Exculpatory evidence is  evidence favorable to the defendant in a criminal trial that exonerates or tends to exonerate the defendant of guilt. It is the opposite of inculpatory evidence, which tends to present guilt. In many countries, including the United States, police and prosecutors are required to disclose to the defendant exculpatory evidence they possess before the defendant enters a plea (guilty or not guilty). In some countries, such as Germany, the prosecutor has to actively search for both exculpatory and inculpatory circumstances and evidence before filing of action.
+Per the Brady v. Maryland (1963) decision, prosecutors in the United States have a duty to disclose exculpatory evidence even if not requested to do so.  While the prosecution is not required to search for exculpatory evidence and must disclose only the evidence in its possession, custody, or control, the prosecution's duty is to disclose  all information known to any member of its team, e.g., police, investigators, crime labs, et cetera. In Brady v. Maryland, the U.S. Supreme Court held that such a requirement follows from constitutional due process and is consistent with the prosecutor's duty to seek justice.  The Brady doctrine is a pretrial discovery rule that was established by the United States Supreme Court in Brady v. Maryland. The rule requires that the prosecution must turn over all exculpatory evidence to the defendant in a criminal case. Exculpatory evidence is evidence that might exonerate the defendant.
+
+
+== Illustration ==
+A victim is murdered by stabbing and a suspect is arrested for the murder. Evidence includes a knife covered with blood found near the victim and the accused found covered in blood at the murder scene. During the investigation, the police interview a witness claiming to have seen the stabbing. The witness makes a statement to the police that another unidentified person committed the crime, not the accused. The witness's statement is exculpatory evidence as it introduces reasonable doubt as to the guilt of the accused. The police either do not believe the witness's account or else find the witness unreliable and choose not to follow up on the lead. The prosecutor is obliged to inform the accused and their attorney of the witness's statement even though the police doubt the witness's version of events. Failure to do so would provide grounds for a motion to dismiss the charges or an appeal of a subsequent guilty verdict.
+
+
+== See also ==
+Brady disclosure
+Giglio v. United States (1972)
+R v Stinchcombe (1991)
+United States v. Williams (1992)
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Experiment-0.md b/data/en.wikipedia.org/wiki/Experiment-0.md
index 13fe6697c..61b17f7e1 100644
--- a/data/en.wikipedia.org/wiki/Experiment-0.md
+++ b/data/en.wikipedia.org/wiki/Experiment-0.md
@@ -4,7 +4,7 @@ chunk: 1/5
 source: "https://en.wikipedia.org/wiki/Experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:50:09.284800+00:00"
+date_saved: "2026-05-05T09:56:26.048482+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Experiment-1.md b/data/en.wikipedia.org/wiki/Experiment-1.md
index a04201ab8..3e9f5dbc6 100644
--- a/data/en.wikipedia.org/wiki/Experiment-1.md
+++ b/data/en.wikipedia.org/wiki/Experiment-1.md
@@ -4,7 +4,7 @@ chunk: 2/5
 source: "https://en.wikipedia.org/wiki/Experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:50:09.284800+00:00"
+date_saved: "2026-05-05T09:56:26.048482+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Experiment-2.md b/data/en.wikipedia.org/wiki/Experiment-2.md
index 372b2a9c1..658aedb2d 100644
--- a/data/en.wikipedia.org/wiki/Experiment-2.md
+++ b/data/en.wikipedia.org/wiki/Experiment-2.md
@@ -4,7 +4,7 @@ chunk: 3/5
 source: "https://en.wikipedia.org/wiki/Experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:50:09.284800+00:00"
+date_saved: "2026-05-05T09:56:26.048482+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Experiment-3.md b/data/en.wikipedia.org/wiki/Experiment-3.md
index eb6fef1fe..aca91cbc3 100644
--- a/data/en.wikipedia.org/wiki/Experiment-3.md
+++ b/data/en.wikipedia.org/wiki/Experiment-3.md
@@ -4,7 +4,7 @@ chunk: 4/5
 source: "https://en.wikipedia.org/wiki/Experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:50:09.284800+00:00"
+date_saved: "2026-05-05T09:56:26.048482+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Experiment-4.md b/data/en.wikipedia.org/wiki/Experiment-4.md
index c58ec7265..e466358c9 100644
--- a/data/en.wikipedia.org/wiki/Experiment-4.md
+++ b/data/en.wikipedia.org/wiki/Experiment-4.md
@@ -4,7 +4,7 @@ chunk: 5/5
 source: "https://en.wikipedia.org/wiki/Experiment"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:50:09.284800+00:00"
+date_saved: "2026-05-05T09:56:26.048482+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Experimental_software_engineering-0.md b/data/en.wikipedia.org/wiki/Experimental_software_engineering-0.md
new file mode 100644
index 000000000..0ac416148
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Experimental_software_engineering-0.md
@@ -0,0 +1,28 @@
+---
+title: "Experimental software engineering"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Experimental_software_engineering"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:54.579784+00:00"
+instance: "kb-cron"
+---
+
+Experimental software engineering involves running experiments on the processes and procedures involved in the creation of software systems, with the intent that the data be used as the basis of theories about the processes involved in software engineering (theory backed by data is a fundamental tenet of the scientific method).  A number of research groups primarily use empirical and experimental techniques.
+The term empirical software engineering emphasizes the use of empirical studies of all kinds to accumulate knowledge.  Methods used include experiments, case studies, surveys, and using whatever data is available.
+
+
+== Empirical software engineering research ==
+In a keynote at the International Symposium on Empirical Software Engineering and Measurement Prof. Wohlin recommended ten commitments that the research community should follow to increase the relevance and impact of empirical software engineering research. However, at the same conference Dr. Ali effectively argued that solely following these will not be enough and we need to do more than just show the evidence substantiating the claimed benefits of our interventions but instead what is required for practical relevance and potential impact is the evidence for cost-effectiveness.
+The International Software Engineering Research Network (ISERN) is a global community of research groups who are active in experimental software engineering. Its purpose is to advance the practice of and foster university and industry collaborations within experimental software engineering. ISERN holds annual meetings in conjunction with the International Symposium on Empirical Software Engineering and Measurement (ESEM) conference.
+
+
+== References ==
+
+
+== Bibliography ==
+Victor Basili, Richard W. Selby, David H. Hutchens, "Experimentation in Software Engineering", IEEE Transactions on Software Engineering, Vol. SE-12, No.7, July 1986
+Basili, V.; Rombach, D.; Schneider, K.; Kitchenham, B.; Pfahl, D.; Selby, R. (Eds.),Empirical Software Engineering Issues. Critical Assessment and Future Directions, Springer-Verlag, 2007, ISBN 978-3-540-71300-5.
+Barry Boehm, Hans Dieter Rombach, and Marvin V. Zelkowitz (eds.), Foundations of Empirical Software Engineering — The Legacy of Victor R. Basili, Springer-Verlag, 2005, ISBN 3-540-24547-2.
+Jones, D. Evidence-based Software Engineering based on the publicly available data, 2020, ISBN 978-1-8382913-0-3
+H. Dieter Rombach, Victor R. Basili and Richard W. Selby (eds.), [Experimental Software Engineering Issues: Critical Assessment and Future Directions], Springer-Verlag, 1993, ISBN 3-540-57092-6.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Extraordinary_claims_require_extraordinary_evidence-0.md b/data/en.wikipedia.org/wiki/Extraordinary_claims_require_extraordinary_evidence-0.md
new file mode 100644
index 000000000..bcd9add78
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Extraordinary_claims_require_extraordinary_evidence-0.md
@@ -0,0 +1,62 @@
+---
+title: "Extraordinary claims require extraordinary evidence"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Extraordinary_claims_require_extraordinary_evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:09.060729+00:00"
+instance: "kb-cron"
+---
+
+"Extraordinary claims require extraordinary evidence" (sometimes shortened to ECREE), also known as the Sagan standard, is an aphorism popularized by science communicator Carl Sagan. He used the phrase in his 1979 book Broca's Brain and the 1980 television program Cosmos. It has been described as fundamental to the scientific method and is regarded as encapsulating the basic principles of scientific skepticism.
+The concept is similar to Occam's razor in that both heuristics prefer simpler explanations of a phenomenon to more complicated ones. In application, there is some ambiguity regarding when evidence is deemed sufficiently "extraordinary". It is often invoked to challenge data and scientific findings, or to criticize pseudoscientific claims. Some critics have argued that the standard can suppress innovation and affirm confirmation biases.
+Philosopher David Hume characterized the principle in his 1748 essay "Of Miracles". Similar statements were made by figures such as Thomas Jefferson in 1808, Pierre-Simon Laplace in 1814, and Théodore Flournoy in 1899. The formulation "extraordinary claims require extraordinary proof" was used a year prior to Sagan, by scientific skeptic Marcello Truzzi.
+
+
+== Application ==
+
+The aphorism "extraordinary claims require extraordinary evidence", according to psychologist Patrizio Tressoldi, "is at the heart of the scientific method, and a model for critical thinking, rational thought and skepticism everywhere". It has also been described as a "fundamental principle of scientific skepticism". The phrase is often used in the context of paranormal and other pseudoscientific claims. It is also frequently invoked in scientific literature to challenge research proposals, like a new species of Amazonian tapir, biparental inheritance of mitochondrial DNA, or a Holocene "mega-tsunami".
+The concept is related to Occam's razor as, according to such a heuristic, simpler explanations are preferred to more complicated ones. Only in situations where extraordinary evidence exists would an extraordinary claim be the simplest explanation. It appears in hypothesis testing, where the hypothesis that there is no evidence for the proposed phenomenon, what is known as the "null hypothesis", is preferred. The formal argument involves assigning a stronger Bayesian prior to the acceptance of the null hypothesis as opposed to its rejection.
+
+
+== Origin and precursors ==
+
+Science communicator Carl Sagan popularized the aphorism in his 1979 book Broca's Brain, and in his 1980 television show Cosmos in reference to claims about extraterrestrials visiting Earth. Sagan had first stated the eponymous standard in a 1977 interview with The Washington Post. However, scientific skeptic Marcello Truzzi used the formulation "extraordinary claims require extraordinary proof" in an article published by Parapsychology Review in 1975, as well as in a Zetetic Scholar article in 1978. Two 1978 articles quoted physicist Philip Abelson—then the editor of the journal Science—using the same phrasing as Truzzi.
+In his 1748 essay "Of Miracles", philosopher David Hume wrote that if "the fact ... partakes of the extraordinary and the marvellous ... the evidence ... received a diminution, greater or less, in proportion as the fact is more or less unusual". Deming concluded that this was the first complete elucidation of the standard. Unlike Sagan, Hume defined the nature of "extraordinary": he wrote that it was a large magnitude of evidence.
+Others had also put forward very similar ideas. Quote Investigator cites similar statements from Benjamin Bayly (in 1708), Arthur Ashley Sykes (1740), Beilby Porteus (1800), Elihu Palmer (1804), and William Craig Brownlee (1824). The French scholar Pierre-Simon Laplace, in essays (1810 and 1814) on the stability of the Solar System, wrote that "the weight of evidence for an extraordinary claim must be proportioned to its strangeness". Thomas Jefferson in an 1808 letter expressed contemporary skepticism of meteorites thus: "A thousand phenomena present themselves daily which we cannot explain, but where facts are suggested, bearing no analogy with the laws of nature as yet known to us, their verity needs proofs proportioned to their difficulty."
+
+
+== Analysis and criticism ==
+Sagan did not describe any concrete or quantitative parameters as to what constitutes "extraordinary evidence", which raises the issue of whether the standard can be applied objectively. Academic and climate-change denialist David Deming notes that it would be "impossible to base all rational thought and scientific methodology on an aphorism whose meaning is entirely subjective". He instead argues that "extraordinary evidence" should be regarded as a sufficient amount of evidence rather than evidence deemed of extraordinary quality. Tressoldi noted that the threshold of evidence is typically decided through consensus. This problem is less apparent in clinical medicine and psychology, where statistical results can establish the strength of evidence.
+Deming also noted that the standard can "suppress innovation and maintain orthodoxy". Others, like Etzel Cardeña, have noted that many scientific discoveries that spurred paradigm shifts were initially deemed "extraordinary" and likely would not have been so widely accepted if extraordinary evidence were required. Uniform rejection of extraordinary claims could affirm confirmation biases in subfields. Additionally, there are concerns that, when inconsistently applied, the standard exacerbates racial and gender biases. Psychologist Richard Shiffrin has argued that the standard should not be used to bar research from publication but to ascertain what is the best explanation for a phenomenon. Conversely, mathematical psychologist Eric-Jan Wagenmakers stated that extraordinary claims are often false and their publication "pollutes the literature". To qualify the publication of such claims, psychologist Suyog Chandramouli has suggested the inclusion of peer reviewers' opinions on their plausibility or an attached curation of post-publication peer evaluations.
+Cognitive scientist and AI researcher Ben Goertzel believes that the phrase is used as a "rhetorical meme" without critical thought. Philosopher Theodore Schick argued that "extraordinary claims do not require extraordinary evidence" if they provide the most adequate explanation. Moreover, theists and Christian apologists like William Lane Craig have argued that it is unfair to apply the standard to religious miracles, as other improbable claims are often accepted based on limited testimonial evidence, such as an individual claiming that they won the lottery.
+
+
+== See also ==
+Epistemology
+Hitchens's razor
+Logical positivism
+Philosophical razor
+Theory of justification
+Hanlon's razor
+
+
+== References ==
+
+
+=== Citations ===
+
+
+=== Works cited ===
+
+
+==== Books ====
+
+
+==== Journal articles ====
+
+
+==== Other media ====
+
+
+== External links ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-0.md b/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-0.md
index 300301fca..c8eaa136f 100644
--- a/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-0.md
+++ b/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-0.md
@@ -4,7 +4,7 @@ chunk: 1/3
 source: "https://en.wikipedia.org/wiki/Hierarchy_of_evidence"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:00:54.949878+00:00"
+date_saved: "2026-05-05T09:56:10.291720+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-1.md b/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-1.md
index e4fa46e20..b1a706926 100644
--- a/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-1.md
+++ b/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-1.md
@@ -4,7 +4,7 @@ chunk: 2/3
 source: "https://en.wikipedia.org/wiki/Hierarchy_of_evidence"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:00:54.949878+00:00"
+date_saved: "2026-05-05T09:56:10.291720+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-2.md b/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-2.md
index f698b3f87..b8b02e19d 100644
--- a/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-2.md
+++ b/data/en.wikipedia.org/wiki/Hierarchy_of_evidence-2.md
@@ -4,7 +4,7 @@ chunk: 3/3
 source: "https://en.wikipedia.org/wiki/Hierarchy_of_evidence"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:00:54.949878+00:00"
+date_saved: "2026-05-05T09:56:10.291720+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Lady_tasting_tea-0.md b/data/en.wikipedia.org/wiki/Lady_tasting_tea-0.md
index 3be66e55d..aa39f8d98 100644
--- a/data/en.wikipedia.org/wiki/Lady_tasting_tea-0.md
+++ b/data/en.wikipedia.org/wiki/Lady_tasting_tea-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Lady_tasting_tea"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:50:32.274299+00:00"
+date_saved: "2026-05-05T09:56:39.095339+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population-0.md b/data/en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population-0.md
new file mode 100644
index 000000000..57ab0cf38
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population-0.md
@@ -0,0 +1,137 @@
+---
+title: "Law of hyperbolic growth of the human population"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:12.676679+00:00"
+instance: "kb-cron"
+---
+
+The law of hyperbolic growth of the human population is an empirical law discovered by Heinz von Foerster, which states that the human population of the Earth has grown hyperbolically over several millennia. In the article published by Foerster et al. it was noted that hyperbolic growth is possible only if humanity acts "as a single player", that is, under the condition of some form of cooperation among all people on Earth. Most authors explain the hyperbolic growth by the joint intellectual development of humanity. At the same time, many (S. Kuznets, J. Simon, M. Kremer, S. V. Tsirel, A. V. Korotayev and others) consider the development of technologies as the main factor. A. V. Podlazov highlights life-saving technologies, which are understood not only as production methods but "in general, any knowledge and skills that can be used to save a person from death or prolong their life". S. P. Kapitsa and a number of other authors name the accumulation of knowledge and information in general as the cause of growth.
+
+== Boundaries of the law's application ==
+According to statistical data, the law of hyperbolic growth ceased to operate in the 1960–1970s. Since 1989, the absolute rates of world population growth have also begun to decline, so it is no longer possible to speak even of linear population growth. According to the model of the French physician Jean-Noël Biraben, the growth limit will be 10–12 billion people; most other models suggest fairly close levels of world population stabilization. Quite plausible are also scenarios of a decrease in the Earth's population after reaching its maximum value.
+Various views have been expressed regarding the beginning of the hyperbolic law's action. In the work of Heinz von Foerster, it was shown that the law of hyperbolic growth has been in effect since the beginning of the Common Era. Astrophysicist Sebastian von Hoerner believed that the hyperbolic law operated throughout the existence of humanity. S. P. Kapitsa, based on the model he developed, calculated the date of the law's beginning as 1.6 million years ago. Other authors usually limit themselves to the period for which there are more or less reliable empirical estimates, for example 40 or 10 thousand years.
+Although the general hyperbolic nature of demographic dynamics is not in doubt, a careful analysis of empirical data shows that the parameters of the hyperbola were not constant. In particular, before the beginning of the Common Era (5th–1st millennium BC), the growth rate was higher than later. A significant change in parameters in the 1st millennium AD is masked by the explosive population growth in recent centuries, compared to which all the vicissitudes of previous history seem insignificant.
+
+== Mathematical formulations ==
+The law received its name because the dynamics of the Earth's population approximately corresponds to a hyperbola – a second-order mathematical curve:
+
+  
+    
+      
+        N
+        (
+        t
+        )
+        =
+        
+          
+            C
+            
+              
+                t
+                
+                  0
+                
+              
+              −
+              t
+            
+          
+        
+        .
+      
+    
+    {\displaystyle N(t)={\frac {C}{t_{0}-t}}.}
+  
+
+Here 
+  
+    
+      
+        N
+        (
+        t
+        )
+      
+    
+    {\displaystyle N(t)}
+  
+ is the world population in year 
+  
+    
+      
+        t
+      
+    
+    {\displaystyle t}
+  
+, 
+  
+    
+      
+        
+          t
+          
+            0
+          
+        
+      
+    
+    {\displaystyle t_{0}}
+  
+ is the so-called singularity, the point in time when the world population would become infinite if hyperbolic growth continued (2025, according to von Hoerner's calculations), 
+  
+    
+      
+        C
+      
+    
+    {\displaystyle C}
+  
+ is a constant; for von Hoerner, 200 billion person-years. Hyperbolic growth is most clearly manifested through doublings: each subsequent doubling of humanity's population occurred approximately twice as fast as the previous one. This can be especially clearly observed in the interval 1650–1970.
+The law can also be represented in differential form:
+
+  
+    
+      
+        
+          
+            
+              d
+              N
+            
+            
+              d
+              t
+            
+          
+        
+        =
+        
+          
+            
+              N
+              
+                2
+              
+            
+            C
+          
+        
+        ,
+      
+    
+    {\displaystyle {\frac {dN}{dt}}={\frac {N^{2}}{C}},}
+  
+
+that is, the population growth rate is proportional to the square of the current population. Since these equations correspond to unlimited growth at the singularity point, a number of authors, starting with M. Kremer and S. P. Kapitsa, build models describing the deviation from this singularity, which has actually been occurring since the 1960–1970s.
+
+== Technological justification of hyperbolic growth ==
+M. Kremer proposed a rigorous mathematical justification for hyperbolic growth, based on the assumptions that population size is proportional to the level of technological development, and the rate of technological development, in turn, depends on the number of "inventors", which is proportional to the population size. Most models of human population growth developed recently are based on Kremer's equation (for example and others). The model of Korotayev–Malkov–Khalturina stands out especially, which also includes Kremer's equation. Without claiming to describe the entire demographic history of humanity, it very well describes the growth dynamics on the stages of 5000 BC–500 AD and 500–2025 (forecast) years.
+In the theory of S. Kuznets–M. Kremer, the literal understanding that in any era per thousand people there is supposedly a constant number of "standard inventors" with unchanging efficiency in improving technologies is criticized. In particular, because "in fact, the vast majority of inventions were obtained in individual, often small, countries in special eras (ancient Greece, Song China, Italy of the Renaissance era, England during the Industrial Revolution and others), while huge regions of the world invented very little" (S. V. Tsirel).
+
+== Life-saving technologies ==
+The highlighting of life-saving technologies, proposed by A. V. Podlazov, has the meaning that skills and knowledge contributing to people's survival spread the fastest. In times when humanity was divided by insurmountable distances and communications between peoples were not regular, only such, the most relevant for everyone, information could spread at a sufficient speed for that time. A. V. Podlazov also developed a model that very well describes the dynamics of human population growth.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population-1.md b/data/en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population-1.md
new file mode 100644
index 000000000..72542a778
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population-1.md
@@ -0,0 +1,27 @@
+---
+title: "Law of hyperbolic growth of the human population"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Law_of_hyperbolic_growth_of_the_human_population"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:12.676679+00:00"
+instance: "kb-cron"
+---
+
+== Accumulation of information ==
+In the works of S. P. Kapitsa the independence of human development from available resources is substantiated. Based on this position, the principle of demographic imperative is advanced, as the self-sufficiency of demography in describing human history. At the same time, leading importance in the cooperative nonlinear mechanism of development is given to the informational interaction of large groups of people. It is the accumulation of information in the process of such interaction that can explain the hyperbolic growth of the human population. Information has a more fundamental character than the technological level and differs from it in integrity: any information can be in demand for creating new technologies, whereas the state of humanity cannot be described by limiting to used technologies.
+According to Kapitsa, humanity is near the inflection point of the population growth curve, falling around 2005. After passing this point, a slowdown was expected, symmetric to the era of hyperbolic growth. Kapitsa's works are criticized for excessive physicalism.
+The accumulation of information and the associated hyperbolic growth of species diversity was also noted until recently (before human intervention) in the biosphere.
+The widely discussed opinion is that further civilization development will be associated precisely with the growth of the volume of information in the human-machine supermind (co-intelligence, synergistic intelligence), possibly based on the Internet. A person can enter the supermind simply as an Internet user, or by improving their biological nature, as a cyborg.
+
+== See also ==
+Exponential growth
+Population boom
+Demographic transition
+Doomsday argument
+
+== Notes ==
+
+== References ==
+Korotayev, A. (2007). "Compact Mathematical Models of World System Development, and How they can Help us to Clarify our Understanding of Globalization Processes". In Modelski, George; Devezas, Tessaleno; Thompson, William R. (eds.). Globalization as Evolutionary Process: Modeling Global Change. London: Routledge. pp. 133–160.
+Ozhovan, M.; Loschinin, M. (2015). "Heuristic Paradoxes of S. P. Kapitsa's Theoretical Demography" (PDF). European Researcher (92 (3)): 237–248.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Law_of_superposition-0.md b/data/en.wikipedia.org/wiki/Law_of_superposition-0.md
new file mode 100644
index 000000000..0096ca6f7
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Law_of_superposition-0.md
@@ -0,0 +1,45 @@
+---
+title: "Law of superposition"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Law_of_superposition"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:33.767014+00:00"
+instance: "kb-cron"
+---
+
+The law of superposition is an axiom that forms one of the bases of the sciences of geology, archaeology, and other fields pertaining to geological stratigraphy. In its plainest form, it states that in undeformed stratigraphic sequences, the oldest strata will lie at the bottom of the sequence, while newer material stacks upon the surface to form new deposits over time. This is paramount to stratigraphic dating, which requires a set of assumptions, including that the law of superposition holds true and that an object cannot be older than the materials of which it is composed. To illustrate the practical applications of superposition in scientific inquiry, sedimentary rock that has not been deformed by more than 90° will exhibit the oldest layers on the bottom, thus enabling paleontologists and paleobotanists to identify the relative ages of any fossils found within the strata, with the remains of the most archaic lifeforms confined to the lowest. These findings can inform the community on the fossil record covering the relevant strata, to determine which species coexisted temporally and which species existed successively in perhaps an evolutionarily or phylogenetically relevant way.
+
+
+== History ==
+The law of superposition was first proposed in 1669 by the Danish scientist Nicolas Steno, and is present as one of his major theses in the groundbreaking seminal work Dissertationis prodromus (1669).  
+In the English-language literature, the law was popularized by William "Strata" Smith, who used it to produce the first geologic map of Britain.  It is the first of Smith's laws, which were formally published in Strata Identified by Fossils (1816–1819).
+
+
+== Archaeological considerations ==
+Superposition in archaeology and especially in stratification use during excavation is slightly different as the processes involved in laying down archaeological strata are somewhat different from geological processes. Human-made intrusions and activity in the archaeological record need not form chronologically from top to bottom or be deformed from the horizontal as natural strata are by equivalent processes. Some archaeological strata (often termed as contexts or layers) are created by undercutting previous strata. An example would be that the silt back-fill of an underground drain would form some time after the ground immediately above it. Other examples of non vertical superposition would be modifications to standing structures such as the creation of new doors and windows in a wall. Superposition in archaeology requires a degree of interpretation to correctly identify chronological sequences and in this sense superposition in archaeology is more dynamic and multi-dimensional.
+
+
+== Other limitations to stratification and superposition ==
+Original stratification induced by natural processes can subsequently be disrupted or permutated by a number of factors, including animal interference and vegetation, as well as limestone crystallization. 
+Stratification behaves in a different manner with surface-formed igneous depositions, such as lava flows and ash falls, and thus superposition may not always successfully apply under certain conditions. 
+
+
+== See also ==
+
+Harris matrix
+Principle of cross-cutting relationships
+Principle of faunal succession
+Principle of lateral continuity
+Principle of original horizontality
+Stratification (archeology)
+Stratigraphy
+Structural geology
+
+
+== References ==
+
+
+=== General sources ===
+Hamblin, W.K.  The Earth's Dynamic Systems: A Textbook in Physical Geology, by W. Kenneth Hamblin, BYU, Provo, UT, Illus. William L. Chesser, Dennis Tasa, (Burgess Publishing Company, Minneapolis, Minnesota), c. 1978, p. 115, "The Principle of Superposition and Original Horizontality;" p. 116: The Law of Faunal Succession, "The Principle of Crosscutting Relations;" pp. 116-17: "The Principle of Inclusion," (as in the Steno discussion above).
+Principles of Archaeological Stratigraphy. 40 figs. 1 pl. 136 pp. London & New York: Academic Press ISBN 0-12-326650-5
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/List_of_experiments-0.md b/data/en.wikipedia.org/wiki/List_of_experiments-0.md
index 64e6922c3..26bac1613 100644
--- a/data/en.wikipedia.org/wiki/List_of_experiments-0.md
+++ b/data/en.wikipedia.org/wiki/List_of_experiments-0.md
@@ -4,7 +4,7 @@ chunk: 1/3
 source: "https://en.wikipedia.org/wiki/List_of_experiments"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:23:33.368384+00:00"
+date_saved: "2026-05-05T09:56:27.280362+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/List_of_experiments-1.md b/data/en.wikipedia.org/wiki/List_of_experiments-1.md
index 05b14efc5..98d05ca27 100644
--- a/data/en.wikipedia.org/wiki/List_of_experiments-1.md
+++ b/data/en.wikipedia.org/wiki/List_of_experiments-1.md
@@ -4,7 +4,7 @@ chunk: 2/3
 source: "https://en.wikipedia.org/wiki/List_of_experiments"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:23:33.368384+00:00"
+date_saved: "2026-05-05T09:56:27.280362+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/List_of_experiments-2.md b/data/en.wikipedia.org/wiki/List_of_experiments-2.md
index 2f29553fe..0c60cec44 100644
--- a/data/en.wikipedia.org/wiki/List_of_experiments-2.md
+++ b/data/en.wikipedia.org/wiki/List_of_experiments-2.md
@@ -4,7 +4,7 @@ chunk: 3/3
 source: "https://en.wikipedia.org/wiki/List_of_experiments"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:23:33.368384+00:00"
+date_saved: "2026-05-05T09:56:27.280362+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/List_of_experiments_in_physics-0.md b/data/en.wikipedia.org/wiki/List_of_experiments_in_physics-0.md
index bcc8385a6..c0808d4be 100644
--- a/data/en.wikipedia.org/wiki/List_of_experiments_in_physics-0.md
+++ b/data/en.wikipedia.org/wiki/List_of_experiments_in_physics-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/List_of_experiments_in_physics"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T08:21:22.807305+00:00"
+date_saved: "2026-05-05T09:56:37.889335+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Marine_isotope_stages-0.md b/data/en.wikipedia.org/wiki/Marine_isotope_stages-0.md
new file mode 100644
index 000000000..c16ff3cff
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Marine_isotope_stages-0.md
@@ -0,0 +1,23 @@
+---
+title: "Marine isotope stages"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Marine_isotope_stages"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:34.928910+00:00"
+instance: "kb-cron"
+---
+
+Marine isotope stages (MIS), marine oxygen-isotope stages, or oxygen isotope stages (OIS), are alternating warm and cool periods in the Earth's paleoclimate, deduced from oxygen isotope data derived from deep sea core samples.  Working backwards from the present, which is MIS 1 in the scale, stages with even numbers have high levels of oxygen-18 and represent cold glacial periods, while the odd-numbered stages are lows in the oxygen-18 figures, representing warm interglacial intervals. The data are derived from pollen and foraminifera (plankton) remains in drilled marine sediment cores, sapropels, and other data that reflect historic climate; these are called proxies.
+The MIS timescale was developed from the pioneering work of Cesare Emiliani in the 1950s, modifying an earlier system introduced by oceanographer Gustaf Arrhenius. It is now widely used in  archaeology and other fields to express dating in the Quaternary period (the last 2.6 million years), as well as providing the fullest and best data for that period for paleoclimatology or the study of the early climate of the Earth, representing "the standard to which we correlate other Quaternary climate records".  Emiliani's work in turn depended on Harold Urey's prediction in a paper of 1947 that the ratio between oxygen-18 and oxygen-16 isotopes in calcite, the main chemical component of the shells and other hard parts of a wide range of marine organisms, should vary depending on the prevailing water temperature in which the calcite was formed.
+Over 100 stages have been identified, currently going back some 6 million years, and the scale may eventually reach back as far as 15 million years.  Some stages, in particular MIS 5, are divided into sub-stages, such as "MIS 5a", with 5 a, c, and e being warm and b and d cold. A numeric system for referring to "horizons" (events rather than periods) may also be used, with for example MIS 5.5 representing the peak point of MIS 5e, and 5.51, 5.52 etc. representing the peaks and troughs of the record at a still more detailed level.  For more recent periods, increasingly precise resolution of timing continues to be developed.
+
+== Developing a timescale ==
+
+In 1957 Emiliani moved to the University of Miami to have access to core-drilling ships and equipment, and began to drill in the Caribbean and collect core data.  A further important advance came in 1967, when Nicholas Shackleton suggested that the fluctuations over time in the marine isotope ratios that had become evident by then were caused not so much by changes in water temperature, as Emiliani thought, but mainly by changes in the volume of ice-sheets, which when they expanded took up the lighter oxygen-16 isotope in preference to the heavier oxygen-18.  The cycles in the isotope ratio were found to correspond to terrestrial evidence of glacials and interglacials. A graph of the entire series of stages then revealed unsuspected advances and retreats of ice and also filled in the details of the stadials and interstadials.
+More recent ice core samples of today's glacial ice substantiated the cycles through studies of ancient pollen deposition. Currently a number of methods are making additional detail possible.  Matching the stages to named periods proceeds as new dates are discovered and new regions are explored geologically. The marine isotopic records appear more complete and detailed than any terrestrial equivalents, and have enabled a timeline of glaciation for the Plio-Pleistocene to be identified.  It is now believed that changes in the size of the major ice sheets such as the historical Laurentide Ice Sheet of North America are the main factor governing variations in the oxygen isotope ratios.
+The MIS data also matches the astronomical data of Milankovitch cycles of orbital forcing or the effects of variations in insolation caused by cyclical slight changes in the tilt of the Earth's axis of rotation – the "orbital theory". Indeed, that the MIS data matched Milankovich's theory, which he formed during World War I, so well was a key factor in the theory gaining general acceptance, despite some remaining problems at certain points, notably the so-called 100,000-year problem.  For relatively recent periods data from radiocarbon dating and dendrochronology also support the MIS data. The sediments also acquire depositional remanent magnetization which allows them to be correlated with earth's geomagnetic reversals.  For older core samples, individual annual depositions cannot usually be distinguished, and dating is taken from the geomagnetic information in the cores.  Other information, especially as to the ratios of gases such as carbon dioxide in the atmosphere, is provided by analysis of ice cores.
+The SPECMAP Project, funded by the US National Science Foundation, has produced one standard chronology for oxygen isotope records, although there are others. This high resolution chronology was derived from several isotopic records, the composite curve was then smoothed, filtered and tuned to the known cycles of the astronomical variables.  The use of a number of isotopic profiles was designed to eliminate 'noise' errors, that could have been contained within a single isotopic record.  Another large research project funded by the US government in the 1970s and 1980s was Climate: Long range Investigation, Mapping, and Prediction (CLIMAP), which to a large degree succeeded in its aim of producing a map of the global climate at the Last Glacial Maximum, some 18,000 years ago, with some of the research also directed at the climate some 120,000 years ago, during the last interglacial. The theoretical advances and greatly improved data available by the 1970s enabled a "grand synthesis" to be made, best known from the 1976 paper Variations in the earth’s orbit: pacemaker of the ice ages (in Science), by J.D. Hays, Shackleton and John Imbrie, which is still widely accepted, and covers the MIS timescale and the causal effect of the orbital theory.
+In 2010 the Subcommission on Quaternary Stratigraphy of the International Commission on Stratigraphy dropped other lists of MIS dates and started using the Lisiecki & Raymo (2005) LR04 Benthic Stack, as updated. This was compiled by Lorraine Lisiecki and Maureen Raymo.
+
+== Stages ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Marine_isotope_stages-1.md b/data/en.wikipedia.org/wiki/Marine_isotope_stages-1.md
new file mode 100644
index 000000000..123582955
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Marine_isotope_stages-1.md
@@ -0,0 +1,110 @@
+---
+title: "Marine isotope stages"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Marine_isotope_stages"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:34.928910+00:00"
+instance: "kb-cron"
+---
+
+The following are the start dates (apart from MIS 5 sub-stages) of the most recent MIS (Lisiecki & Raymo 2005, LR04 Benthic Stack). The figures, in thousands of years ago, are from Lisiecki's website. Numbers for substages in MIS 5 denote peaks of substages rather than boundaries.
+
+MIS     Start date
+MIS 1 – 14 kya, end of the Younger Dryas marks the start of the Holocene. The LR04 date of 14 kya had to accommodate less well studied time intervals, and the generally accepted date of 11.7 kya is to be preferred.
+MIS 2 – 29 (Last Glacial Maximum)
+MIS 3 – 57 (MIS 5d- MIS 2 is called the Last Glacial Period, Wisconsinan glaciation in North America, Weichselian glaciation in northern Europe)
+MIS 4 – 71
+MIS 5 – 130, usually sub-divided into a to e:
+MIS 5a – 82 (peak of interglacial sub-stage)
+MIS 5b – 87 (peak of glacial sub-stage)
+MIS 5c – 96 (peak of interglacial sub-stage)
+MIS 5d – 109 (peak of glacial sub-stage)
+MIS 5e – 123 (peak of Last Interglacial, also known as the Eemian among other names)
+MIS 6 – 191 (Penultimate Glacial Period, also called Illinoian glacial in North America, later Saalian in northern Europe and later Wolstonian in Britain)
+MIS 7 – 243 (Aveley Interglacial in Britain)
+MIS 8 – 300 (early Wolstonian in Britain)
+MIS 9 – 337 (Purfleet Interglacial in Britain)
+MIS 10 – 374
+MIS 11 – 424 (Hoxnian Interglacial in Britain, and Holstein Interglacial in Central Europe)
+MIS 12 – 478 (Anglian Glacial in Britain, Elster glaciation in northern Europe)
+MIS 13 – 524
+MIS 14 – 563
+MIS 15 – 621
+MIS 16 – 676
+MIS 17 – 712
+MIS 18 – 761
+MIS 19 – 790 (Brunhes–Matuyama reversal)
+MIS 20 – 814
+MIS 21 – 866
+The list continues to MIS 104, beginning 2.614 million years ago.
+
+== Older versions ==
+The following are the start dates of the most recent MIS, in kya (thousands of years ago). The first figures are derived by Aitken & Stokes from Bassinot et al. (1994), with the figures in parentheses alternative estimates from Martinson et al. for stage 4 and for the others the SPECMAP figures in Imbrie et al. (1984). For stages 1–16 the SPECMAP figures are within 5 kya of the figures given here. All figures up to MIS 21 are taken from Aitken & Stokes, Table 1.4, except for the sub-stages of MIS 5, which are from Wright's Table 1.1.
+
+MIS 1 – 11 kya, end of the Younger Dryas marks the start of the Holocene, continuing to the present
+MIS 2 – 24 near Last Glacial Maximum
+MIS 3 – 60
+MIS 4 – 71 (74)
+MIS 5 – 130, includes the Eemian; usually sub-divided into a to 5e:
+MIS 5a – 84.74
+MIS 5b – 92.84
+MIS 5c – 105.92
+MIS 5d – 115.105
+MIS 5e – 130.115
+MIS 6 – 190
+MIS 7 – 244
+MIS 8 – 301
+MIS 9 – 334
+MIS 10 – 364
+MIS 11 427, the most similar to MIS 1.
+MIS 12 – 474
+MIS 13 – 528
+MIS 14 – 568
+MIS 15 – 621
+MIS 16 – 659
+MIS 17 – 712 (689)
+MIS 18 – 760 (726)
+MIS 19 – 787 (736)
+MIS 20 – 810 (763)
+MIS 21 – 865 (790)
+Some older stages, in mya (millions of years ago):
+
+MIS 22 – 1.03 mya, marking the end of the Bavelian period in Europe
+MIS 62 – 1.75, end of the Tiglian
+MIS 103 – 2.588, end of the Pliocene and start of the Pleistocene, on the INQUA time scale (older definitions put this change at 1.806 mya – the MIS date is unaffected)
+
+== See also ==
+Timeline of glaciation
+Geologic temperature record
+Paleothermometer
+Anthropocene
+Marine terrace
+Ice core
+
+== Notes ==
+
+== Citations ==
+
+== References ==
+Aitken, Martin J and Stokes, Stephen, in Taylor, Royal Ervin Taylor and Aitken, Martin Jim (eds), Chronometric dating in archaeology, Chapter 1, 1997, Birkhäuser, ISBN 0-306-45715-6, ISBN 978-0-306-45715-9, google books
+Andrews, John T. (2000). "Dating Glacial Events and Correlation to Global Climate Change". In Jay Stratton Noller; Janet M. Sowers; William R. Lettis (eds.). Quaternary Geochronology: Methods and Applications. AGU Reference Shelf. American Geophysical Union. pp. 447–455. doi:10.1029/RF004p0447. ISBN 978-1-118-66848-1. ISBN 0-87590-950-7, ISBN 978-0-87590-950-9
+"Concise", Ogg, James George, Ogg, Gabi, Gradstein F. M., The Concise Geologic Time Scale, 2008, Cambridge University Press, 2008, ISBN 0-521-89849-8, ISBN 978-0-521-89849-2
+Cronin, Thomas M., Paleoclimates: understanding climate change past and present, Columbia University Press, 2010, ISBN 0-231-14494-6, ISBN 978-0-231-14494-0, google books
+Pettit, Paul; White, Mark (2012). The British Palaeolithic: Human Societies at the Edge of the Pleistocene World. Abingdon, UK: Routledge. ISBN 978-0-415-67455-3.
+Sowers, Janet M. (2000). "Correlating Quaternary Landforms and Deposits to Global Climate Change". In Jay Stratton Noller; Janet M. Sowers; William R. Lettis (eds.). Quaternary Geochronology: Methods and Applications. AGU Reference Shelf. American Geophysical Union. pp. 425–426. doi:10.1029/RF004p0425. ISBN 978-1-118-66848-1. ISBN 0-87590-950-7, ISBN 978-0-87590-950-9
+Wright, James D. (2000). "Global Climate Change in Marine Stable Isotope Records". In Jay Stratton Noller; Janet M. Sowers; William R. Lettis (eds.). Quaternary Geochronology: Methods and Applications. AGU Reference Shelf. American Geophysical Union. pp. 427–433. doi:10.1029/RF004p0427. ISBN 978-1-118-66848-1. ISBN 0-87590-950-7, ISBN 978-0-87590-950-9
+
+== Further reading ==
+Cohen, K.M. and Gibbard, P.L., Global chronostratigraphical correlation table for the last 2.7 million years (updated version 2011), Subcommission on Quaternary Stratigraphy, International Commission on Stratigraphy: Cambridge.
+
+== External links ==
+Marine Isotope Substage 5e and the Eemian Interglacial, NJ Shackleton, 2003
+650,000 years of greenhouse gas concentrations, RealClimate, 2005
+Glacial variability over the last two million years, P Huybers, 2007
+The polar paleoclimate signature of Marine Isotope Stage 31, Reed Scherer, 2007
+Oceanic forcing of the Marine Isotope Stage 11 interglacial, Alexander J. Dickson,   Christopher J. Beer,   Ciara Dempsey,   Mark A. Maslin,   James A. Bendle,   Erin L. McClymont  &  Richard D. Pancost, 2009
+Last Time Carbon Dioxide Levels Were This High: 15 Million Years Ago, Aradhna Tripati, 2009
+US NCDC
+NASA SPECMAP
+Global chronostratigraphical correlation table for the last 2.7 million years, v.2010, International Commission on Stratigraphy
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Mesocosm-0.md b/data/en.wikipedia.org/wiki/Mesocosm-0.md
new file mode 100644
index 000000000..48d4abe5b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Mesocosm-0.md
@@ -0,0 +1,38 @@
+---
+title: "Mesocosm"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Mesocosm"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:40.253813+00:00"
+instance: "kb-cron"
+---
+
+A mesocosm (meso- or 'medium' and -cosm 'world') is any outdoor or indoor experimental system that examines the natural environment under controlled conditions. In this way mesocosm studies provide a link between field surveys and highly controlled laboratory experiments.
+Mesocosms tend to be medium-sized to large (e.g., aquatic mesocosm range: 1 litre (34 US fl oz) to 10,000 litres (2,600 US gal)+) and contain multiple trophic levels of interacting organisms.
+In contrast to laboratory experiments, mesocosm studies are normally conducted outdoors in order to incorporate natural variation (e.g., diel cycles).  Mesocosm studies may be conducted in either an enclosure that is small enough that key variables can be brought under control or by field-collecting key components of the natural environment for further experimentation. In coastal and aquatic ecology, specialized mesocosms can also be designed to reproduce wave climates, allowing experiments on hydrodynamic effects on organisms, sediments, and biogeomorphic processes. 
+Extensive mesocosm studies have been conducted to evaluate how organisms or communities might react to environmental change, through deliberate manipulation of environmental variables, such as increased temperature, carbon dioxide or pH levels.
+
+
+== Advantages ==
+
+The advantage of mesocosm studies is that environmental gradients of interest (e.g., warming temperatures) can be controlled or combined to separate and understand the underlying mechanism(s) affecting the growth or survival of species, populations or communities of interest. By manipulating gradients (e.g., climate variables) mesocosm studies can extend beyond available data helping to build better models of the effects of different scenarios. Mesocosm experiments also tend to include replication of different treatment levels.
+Manipulating something can give an idea as to what to expect if something were to occur in that ecosystem or environment. For indoor mesocosms, growth chambers grant greater control over the experiment. When plants are placed in a growth chamber, the air, temperature, heat and light distribution can be manipulated and the effects of being exposed to different amounts of each factor can be observed.
+Greenhouses also contribute to mesocosm studies although sometimes, it may induce climate change, interfering with the experiment and resulting in inefficient data.
+
+
+== Disadvantages ==
+Using growth chambers for a laboratory experiment is sometimes a disadvantage due to the limited amount of space.
+  Another disadvantage to using mesocosms is not adequately imitating the environment, causing the organism to avoid giving off a certain reaction versus its natural behavior in its original environment.
+
+
+== Examples ==
+
+Mazzeo and colleagues examined the eating habits of Hoplias malabaricus fish when exposed to different amounts of phytoplankton, zooplankton, and competition. Three months prior to conducting the experiment, they maintained an average precipitation, air temperature, and overall subtropical environment. Using 12 units, they filled them with aquifer water, sand and plants and kept them in isolation until the environment became suitable for phytoplankton to emerge. After careful preparation, Mazzeo et al. began the experiment dividing those units into categories of a control (zooplankton and phytoplankton) and 3 experiments: (Jenynsia multidentata with zooplankton and phytoplankton), (juvenile Hoplias malabaricus with zooplankton and phytoplankton), and (Large Hoplias malabaricus, Jenynsia multidentata, zooplankton, and phytoplankton) and observed biomass differences within different conditions.
+Flanagan and McCauley tested the effects of climate warming on carbon dioxide concentration on shallow ponds by creating an eight-cylinder shaped in situ mesocosms.  They divided it into four controls and four experiments on University of Calgary's campus pond. Those mesocosms contained openings underneath and were submerged at the same depth as the pond. By carefully sustaining the sediments and temperature from any changes, the production of zooplankton and algae were successful. After manipulation (pumping heat into water), they measured the sediments at the bottom of the pond for carbon dioxide concentration. After collecting data and analyzing it, Flanagan and McCauley concluded that due to the warming of the environment in the pond, carbon dioxide from the pond will increase into the surroundings, in turn, decreasing the amount of carbon dioxide within the sediments, indirectly modifying the carbon cycle of that ecosystem.
+
+Mesocosms are useful for studying the fate of pollutants in marine environments as well as providing the ability to conduct controlled manipulative experiments that could not be undertaken in natural marine environments.  Since 1976, the Marine Ecosystems Research Laboratory (MERL) at the University of Rhode Island has been conducting pollution studies and experimental marine ecological studies using mesocosm tanks drawing water from nearby Narragansett Bay.  
+Mesocosms have also been used to study how the diversification of three-spined sticklebacks influences trophic communities and other ecosystem processes.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Miyake_event-0.md b/data/en.wikipedia.org/wiki/Miyake_event-0.md
new file mode 100644
index 000000000..a3e03cf4e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Miyake_event-0.md
@@ -0,0 +1,41 @@
+---
+title: "Miyake event"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Miyake_event"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:36.116294+00:00"
+instance: "kb-cron"
+---
+
+A Miyake event is an observed sharp enhancement of the production of cosmogenic isotopes by cosmic rays. It can be marked by a spike in the concentration of radioactive carbon isotope 14C in tree rings, as well as 10Be and 36Cl in ice cores, which are all independently dated. At present, five significant events are known (7176 BCE, 5259 BCE, 664-663 BCE (historically referred to as 660 BCE), 774 CE, 993 CE) for which the spike in 14C is quite remarkable, i.e. above 1% rise over a period of two years, and four more events (12,350 BCE, 5410 BCE, 1052 CE, 1279 CE) need independent confirmation. It is not known how often Miyake events occur, but from the available data it is estimated to be every 400 to 2,400 years. A Miyake event occurring in modern conditions would cause severe damage to global technological infrastructure such as satellites, telecommunications, and power grids.
+There is strong evidence that Miyake events are caused by extreme solar particle events and they are likely related to super-flares discovered on solar-like stars.  Although Miyake events are based on extreme year-to-year rises of 14C concentration, the duration of the periods over which the 14C levels increase or stay at high levels is longer than one year. However, a universal cause and origin of all the events is not yet established, and some of the events may be caused by other phenomena coming from beyond the Solar System, such as a gamma-ray burst.
+A 2023 study dated the largest known Miyake event between 12,350 and 12,349 BCE, identified by an international team who measured radiocarbon levels in ancient trees recovered from the eroded banks of the Drouzet River, near Gap in the Southern French Alps. 
+Although the 14C increase was nearly double that for the next strongest spike in 774 CE, the strength of the corresponding solar event was only 18% higher, because of the combined effect of the lower atmospheric CO2 level and weaker geomagnetic field  However, this event has not yet been independently confirmed in wood from other regions, nor is it reliably supported by a clear corresponding spike in other isotopes, such as beryllium-10, that are needed to reconstruct the spectrum of solar energetic particles.
+
+
+== Discovery ==
+The events are named after the Japanese physicist Fusa Miyake who, as a doctoral student, was the first to identify these radiocarbon spikes and published the results with co-authors in 2012 in the journal Nature. The investigation at that time found a strong 14C increase in the annual rings of Japanese cedars for the years 774/775. The event of 775 was independently discovered, using the low-resolution IntCal data.
+In 2013, Miyake and co-authors published the discovery of another similar radiocarbon spike in the years 993/994. In December 2013, Miyake received her Doctor of Science degree from Nagoya University.
+
+
+== Time benchmark ==
+After a Miyake event is well-studied and confirmed, it can serve as a reference time benchmark, a "year-stamp", enabling more precise dating of historical buildings, objects, and events. Six diverse historical occurrences, from archaeological sites to natural disasters, have thus been dated to a specific year, using Miyake events as benchmarks and counting subsequent annual tree rings. For example, the Miyake event of 993 has been identified in woodwork from the Viking archaeological site at L'Anse aux Meadows, in Newfoundland, and counting later tree rings has shown that that the wood is from a tree felled in 1021, and thus that Europeans reached the Americas by that date. Another study performed on the tree-rings of wooden building remains from the Neolithic waterlogged site of Dispilio in north-western Greece, identified the Miyake event of 5259 BC, thus for a first time absolutely dating a Neolithic site in Europe from the 6th millennium BC to a single calendar year.
+
+
+== See also ==
+Carrington Event
+Coronal mass ejection
+Dendrochronology
+Geomagnetic storm
+Solar storm
+
+
+== References ==
+
+
+== External links ==
+Researchers succeed for first time in accurately dating a 7,000-year-old prehistoric settlement using cosmic rays – May 21, 2024 – University of Bern
+"Young Researcher in the Spotlight: Fusa Miyake at the Solar-Terrestrial Environment Laboratory". Nagoya University. 27 May 2013. Retrieved 17 October 2023.
+Carlson, Erika K. (29 May 2020). "Sun's Past Hidden in Tree Rings • Physicist Fusa Miyake measures isotope abundances in ancient tree rings to uncover solar eruptions from thousands of years ago". Physics. 13. Physics 13, 78: 78. Retrieved 17 October 2023. Q&A
+"José A. Boninsegna Frontiers in Dendrochronology Award was given to Associate Professor Fusa Miyake". Institute for Space-Earth Environmental Research, Nagoya University. 26 July 2022. Retrieved 17 October 2023.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-0.md b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-0.md
new file mode 100644
index 000000000..7ecd61a6b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-0.md
@@ -0,0 +1,41 @@
+---
+title: "Molten-Salt Reactor Experiment"
+chunk: 1/4
+source: "https://en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:41.440510+00:00"
+instance: "kb-cron"
+---
+
+The Molten-Salt Reactor Experiment (MSRE) was an experimental molten-salt reactor research reactor at the Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee. This technology was researched through the 1960s, the reactor was constructed by 1964, it went critical in 1965, and was operated until 1969. The costs of a cleanup project were estimated at $130 million.
+Initially designed for 15 MWth, the MSRE was operated at 7.4 MWth because of imprecise nuclear cross section data. It was a test reactor simulating the neutronic "kernel" of a type of inherently safer epithermal thorium breeder reactor called the liquid fluoride thorium reactor. It primarily used two fuels: first uranium-235 and later uranium-233. The latter 233UF4 was the result of breeding from thorium in other reactors. Since this was an engineering test, the large, expensive breeding blanket of thorium salt was omitted in favor of neutron measurements.
+In the MSRE, the heat from the reactor core was shed via a cooling system using air blown over radiators. It is thought similar reactors could power high-efficiency heat engines such as closed-cycle gas turbines. The MSRE's piping, core vat and structural components were made from Hastelloy-N, and its moderator was a pyrolytic graphite core. The fuel for the MSRE was LiF-BeF2-ZrF4-UF4 (65-29.1-5-0.9 mole %). The secondary coolant was FLiBe (2LiF-BeF2), and it operated as hot as 650 °C and operated for the equivalent of about 1.5 years of full power operation.
+The result promised to be a simple, reliable reactor. The purpose of the Molten-Salt Reactor Experiment was to demonstrate that some key features of the proposed molten-salt power reactors could be embodied in a practical reactor that could be operated safely and reliably and be maintained without excessive difficulty. For simplicity, it was to be a fairly small, one-fluid (i.e. non-breeding) reactor operating at 10 MWth or less, with heat rejection to the air via a secondary (fuel-free) salt.
+
+== Reactor description ==
+
+=== Core ===
+
+The pyrolytic graphite core, grade CGB, also served as the moderator. Before the MSRE development began, tests had shown that salt would not permeate graphite in which the pores were on the order of a micrometer. However, graphite with the desired pore structure was available only in small, experimentally prepared pieces, and when a manufacturer set out to produce a new grade (CGB) to meet the MSRE requirements, difficulties were encountered.
+
+=== Fuel ===
+The fuel was 7LiF-BeF2-ZrF4-UF4 (65-29.1-5-0.9 mole %). The first fuel was 33% 235U; later a smaller amount of 233UF4 was used. By 1960 a better understanding of fluoride salt based molten-salt reactors had emerged from earlier molten salt reactor research for the Aircraft Reactor Experiment. Fluoride salts are strongly ionic, and when melted they are stable at high temperatures, low pressures, and high radiation fluxes. Stability at low pressure permits less robust reactor vessels and increases reliability. The high reactivity of fluorine traps most fission reaction byproducts. It appeared that the fluid salt would permit on-site chemical separation of the fuel and wastes.
+
+The fuel system was located in sealed cells, laid out for maintenance with long-handled tools through openings in the top shielding. A tank of LiF-BeF2 salt was used to flush the fuel circulating system before and after maintenance. In a cell adjacent to the reactor was a simple facility for bubbling gas through the fuel or flush salt: H2-hydrogen fluoride mixture, in roughly 10:1 ratio, to remove oxide, fluorine to remove uranium as uranium hexafluoride.
+The secondary coolant was LiF-BeF2 (66–34 mole %).
+
+=== Pump ===
+The bowl of the fuel pump was the surge space for the circulating loop, and here about 50 US gallons per minute (190 L/min) of fuel was sprayed into the gas space to allow xenon and krypton to escape from the salt. Removing the most significant neutron poison xenon-135 made the reactor safer and easier to restart. In solid-fuel reactors, on restart the 135Xe in the fuel absorbs neutrons, followed by a sudden jump in reactivity as the 135Xe is burned out. Conventional reactors may have to wait hours until xenon-135 decays after shutting down and not immediately restarting (so-called iodine pit).
+Also in the pump bowl was a port through which salt samples could be taken or capsules of concentrated fuel-enriching salt (UF4-LiF or PuF3) could be introduced.
+
+=== Air-cooled heat exchangers ===
+
+At the time, the high temperatures were seen almost as a disadvantage because they hampered use of conventional steam turbines. Now, such temperatures are seen as an opportunity to use high-efficiency closed-cycle gas turbines. After two months of high-power operation, the reactor was down for 3 months because of the failure of one of the main cooling blowers.
+
+=== Neutronics and thermal-hydraulics ===
+The reactor experienced stable neutronic operation. If temperatures increased or bubbles formed, the volume of the fluid fuel salts would increase and some fluid fuel salts would be forced out of the core, thereby reducing the reactivity. The MSRE development program did not include reactor physics experiments or heat transfer measurements. There was enough latitude in the MSRE that deviations from predictions would not compromise safety or accomplishment of the objectives of the experimental reactor.
+
+=== Building grounds ===
+
+Construction of the primary system components and alterations of the old Aircraft Reactor Experiment building (which had been partly remodeled for a proposed 60 MWth aircraft reactor) were started in 1962. Installation of the salt systems was completed in mid-1964. ORNL was responsible for quality assurance, planning, and management of construction. The primary systems were installed by ORNL personnel; subcontractors modified the building and installed ancillary systems.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-1.md b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-1.md
new file mode 100644
index 000000000..7013edada
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-1.md
@@ -0,0 +1,25 @@
+---
+title: "Molten-Salt Reactor Experiment"
+chunk: 2/4
+source: "https://en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:41.440510+00:00"
+instance: "kb-cron"
+---
+
+=== Structural alloy Hastelloy-N ===
+Hastelloy-N—a low chromium, nickel–molybdenum alloy—was used in the MSRE and proved compatible with the fluoride salts FLiBe and FLiNaK. All metal parts contacting salt were made of Hastelloy-N. The choice of Hastelloy-N for the MSRE was on the basis of the promising results of tests at aircraft nuclear propulsion conditions and the availability of much of the required metallurgical data. Development for the MSRE generated the further data required for ASME code approval. It also included preparation of standards for Hastelloy-N procurement and for component fabrication.
+Almost 200,000 lb (90,000 kg) in a variety of shapes of material for the MSRE were produced commercially. Requests for bids on component fabrication went to several companies in the nuclear fabrication industry, but all declined to submit lump-sum bids because of lack of experience with the new alloy. Consequently, all major components were fabricated in U.S. Atomic Energy Commission-owned shops at Oak Ridge and Paducah, Kentucky.
+At the time that design stresses were set for the MSRE, the data that was available indicated that the strength and creep rate of Hastelloy-N were hardly affected by irradiation. After the construction was well along, the stress-rupture life and fracture strain were found to be drastically reduced by thermal neutron irradiation. The MSRE stresses were reanalyzed, and it was concluded that the reactor would have adequate life to reach its goals. At the same time a program was launched to improve the resistance of Hastelloy-N to the embrittlement.
+An out-of-pile corrosion test program was carried out for Hastelloy-N, which indicated extremely low corrosion rates at MSRE conditions. Capsules exposed in the Materials Testing Reactor showed that salt fission power densities of more than 200 W/cm3 had no adverse effects on compatibility of fuel salt, Hastelloy-N, and graphite. Fluorine gas was found to be produced by radiolysis of frozen salts, but only at temperatures below about 212 °F (100 °C).
+Components that were developed especially for the MSRE included flanges for 5-inch (130 mm) lines carrying molten salt, freeze valves (an air-cooled section where salt could be frozen and thawed), flexible control rods to operate in thimbles at 1,200 °F (649 °C), and the fuel sampler-enricher. Centrifugal pumps were developed similar to those used successfully in the aircraft reactor program, but with provisions for remote maintenance, and including a spray system for xenon removal. Remote maintenance considerations pervaded the MSRE design, and developments included devices for remotely cutting and brazing together 1+1⁄2 inches (38 mm) pipe, removable heater-insulation units, and equipment for removing specimens of metal and graphite from the core.
+
+=== Development and construction ===
+Most of the MSRE effort from 1960 through 1964 was devoted to design, development, and construction of the MSRE. Production and further testing of graphite and Hastelloy-N, both in-pile and out, were major development activities. Others included work on reactor chemistry, development of fabrication techniques for Hastelloy-N, development of reactor components, and remote-maintenance planning and preparations.
+
+== Operation ==
+
+The MSRE operated for 5 years. The salt was loaded in 1964, and nuclear operation ended in December 1969, and all the objectives of the experiment were achieved during this period.
+Checkout and prenuclear tests included 1,000 hours of circulation of flush salt and fuel carrier salt. Nuclear testing of the MSRE began in June 1965, with the addition of enriched 235U as UF4-LiF eutectic to the carrier salt to make the reactor critical. After zero-power experiments to measure rod worth and reactivity coefficients, the reactor was shut down and final preparations made for power operation. Power ascension was delayed when vapors from oil that had leaked into the fuel pump were polymerized by the radioactive offgas and plugged gas filters and valves. Maximum power, which was limited to 7.4 MWth by the capability of the heat-rejection system, was reached in May 1966.
+After two months of high-power operation, the reactor was down for three months because of the failure of one of the main cooling blowers. Some further delays were encountered because of offgas line plugging, but by the end of 1966 most of the startup problems were behind. During the next 15 months, the reactor was critical 80% of the time, with runs of 1, 3, and 6 months that were uninterrupted by a fuel drain. By March 1968, the original objectives of the MSRE had been accomplished, and nuclear operation with 235U was concluded.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-2.md b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-2.md
new file mode 100644
index 000000000..cb4ecfc96
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-2.md
@@ -0,0 +1,58 @@
+---
+title: "Molten-Salt Reactor Experiment"
+chunk: 3/4
+source: "https://en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:41.440510+00:00"
+instance: "kb-cron"
+---
+
+By this time, ample 233U had become available, so the MSRE program was extended to include substitution of 233U for the uranium in the fuel salt, and operation to observe the new nuclear characteristics. Using the on-site processing equipment the flush salt and fuel salt were fluorinated to recover the uranium in them as UF6. 233UF4-LiF eutectic was then added to the carrier salt, and in October 1968, the MSRE became the world's first reactor to operate on 233U.
+The 233U zero-power experiments and dynamics tests confirmed the predicted neutronic characteristics. An unexpected consequence of processing the salt was that its physical properties were altered slightly so that more than the usual amount of gas was entrained from the fuel pump into the circulating loop. The circulating gas and the power fluctuations that accompanied it were eliminated by operating the fuel pump at slightly lower speed. Operation at high power for several months permitted accurate measurement of the capture-to-fission ratio, for 233U in this reactor, completing the objectives of the 233U operation.
+In the concluding months of operation, xenon stripping, deposition of fission products, and tritium behavior were investigated. The feasibility of using plutonium in molten-salt reactors was emphasized by adding PuF3 as makeup fuel during this period.
+After the final shutdown in December 1969, the reactor was left in standby for nearly a year. A limited examination program was then carried out, including a moderator bar from the core, a control rod thimble, heat exchanger tubes, parts from the fuel pump bowl, and a freeze valve that had developed a leak during the final reactor shutdown. The radioactive systems were then closed to await ultimate disposal.
+
+=== Statistics ===
+Parameters and operational statistics:
+Power: 8 MW (thermal)
+output: 92.8 GWh
+equivalent full-power: 11,555 h
+Fuel salt: fluoride
+cations: 65% Li-7, 29.1% Be, 5% Zr, 0.9% U
+weight: 11,260 lbs (5,107 kg)
+melting temp: 813 F (434 C)
+inlet temp: 1175 F (635 C)
+outlet temp: 1225 F (663 C)
+flow rate: 400 gal/min (1514 l/min)
+fuel pump circulating: 19,405 h
+Coolant salt: fluoride
+cations: 66% Li-7, 34% Be
+weight: 15,300 lbs (6,940 kg)
+coolant pump circulating: 23,566 h
+Moderator: nuclear graphite
+Container: Hastelloy-N
+First fuel: U-235
+first critical: 1 June 1965
+thermal output: 72,441 MWh
+critical hours: 11,515 h
+full-power output equivalent: 9,006 h
+Second fuel: U-233
+critical: 2 October 1968
+thermal output: 20,363 MWh
+critical hours: 3,910 h
+full-power output equivalent: 2,549 h
+Shutdown: December 1969
+
+== Results ==
+The broadest and perhaps most important conclusion from the MSRE experience was that a molten salt fueled reactor concept was viable. It ran for considerable periods of time, yielding valuable information, and maintenance was accomplished safely and without excessive delay.
+The MSRE confirmed expectations and predictions. For example, it was demonstrated that: the fuel salt was immune to radiation damage, the graphite was not attacked by the fuel salt, and the corrosion of Hastelloy-N was negligible. Noble gases were stripped from the fuel salt by a spray system, reducing the 135Xe poisoning by a factor of about 6. The bulk of the fission product elements remained stable in the salt. Additions of uranium and plutonium to the salt during operation were quick and uneventful, and recovery of uranium by fluorination was efficient. The neutronics, including critical loading, reactivity coefficients, dynamics, and long-term reactivity changes, agreed with prior calculations.
+In other areas, the operation resulted in improved data or reduced uncertainties. The 233U capture-to-fission ratio in a typical MSR neutron spectrum is an example of basic data that was improved. The effect of fissioning on the redox potential of the fuel salt was resolved. The deposition of some elements ("noble metals") was expected, but the MSRE provided quantitative data on relative deposition on graphite, metal, and liquid-gas interfaces. Heat transfer coefficients measured in the MSRE agreed with conventional design calculations and did not change over the life of the reactor. Limiting oxygen in the salt proved effective, and the tendency of fission products to be dispersed from contaminated equipment during maintenance was low.
+Operation of the MSRE provided insights into the problem of tritium in a molten-salt reactor. It was observed that about 6–10% of the calculated 54 Ci/day (2.0 TBq) production diffused out of the fuel system into the containment cell atmosphere and another 6–10% reached the air through the heat removal system. The fact that these fractions were not higher, indicated that something partially negated the transfer of tritium through hot metals.
+One unexpected finding was inter-granular cracking in all metal surfaces exposed to the fuel salt. The cause of the embrittlement was tellurium, a fission product generated in the fuel. This was first noted in the specimens that were removed from the core at intervals during the reactor operation. Post-operation examination of pieces of a control-rod thimble, heat-exchanger tubes and pump bowl parts revealed the ubiquity of the cracking and emphasized its importance to the MSR concept. The crack growth was rapid enough to become a problem over the planned 30-year life of a follow-on thorium breeder reactor. This cracking could in short-term be reduced by adding small amounts of niobium to the Hastelloy-N. However, further studies were needed to assess the effects of longer exposure times and some interaction parameters for the used mixtures.
+The operation experience gained with the MSRE showed that the following areas require further investigation for the successful operation of a commercial MSR:
+
+Maintaining the salt as a liquid in all parts of primary system, particularly in extremities far from the core.
+Tight control of tritium production and transport from the core (only <20% could be removed due to diffusion and heat removal system in the MSRE).
+Reduction in growth of inter-granular cracks in exposed metal surfaces (due to tellurium, a fission product of uranium).
+Decommissioning and disposal of the reactor structure and waste salt (approx. costs in 2019 are $10mil/yr ).
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-3.md b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-3.md
new file mode 100644
index 000000000..5e4c99260
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment-3.md
@@ -0,0 +1,35 @@
+---
+title: "Molten-Salt Reactor Experiment"
+chunk: 4/4
+source: "https://en.wikipedia.org/wiki/Molten-Salt_Reactor_Experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:41.440510+00:00"
+instance: "kb-cron"
+---
+
+== Decommissioning ==
+After shutdown, the salt was believed to be in long-term safe storage. At low temperatures, radiolysis can free fluorine from the salt. As a countermeasure, the salt was annually reheated to about 302 °F (150 °C) until 1989. But beginning in the mid-1980s, there was concern that radioactivity was migrating through the system, reported by an ORNL employee who was among 125 people working above the reactor, which had not been decontaminated or decommissioned. Department of Energy Oak Ridge Operations Manager Joe Ben LaGrone ordered evacuation of 125 employees, based on findings reported to him inspector William Dan DeFord, P.E.
+Sampling in 1994 revealed concentrations of uranium that created a potential for a nuclear criticality accident, as well as a potentially dangerous build-up of fluorine gas: the environment above the solidified salt was approximately one atmosphere of fluorine. The ensuing decontamination and decommissioning project was called "the most technically challenging" activity assigned to Bechtel Jacobs under its environmental management contract with the U.S. Department of Energy's Oak Ridge Operations organization.
+In 2003, decommissioning was expected to be completed in 2009, and the MSRE cleanup project cost estimated at $130 million. Removal of uranium from the salt was completed in March 2008, however still leaving the salt with the fission products in the tanks. Much of the high cost was caused by the unpleasant surprise of fluorine and uranium hexafluoride evolution from cold fuel salt in storage that ORNL did not defuel and store correctly, but this has now been taken into consideration in MSR design.
+A potential decommissioning process has been described; uranium is to be removed from the fuel as the hexafluoride by adding excess fluorine, and plutonium as the plutonium dioxide by adding sodium carbonate.
+
+As of 2019, the MSRE is in a SAFESTOR state, meaning it is still intact but shut down and actively monitored and maintained.
+
+== See also ==
+Thorium fuel cycle
+Fuji MSR
+Thorium-based nuclear power
+
+== References ==
+
+Briggs, R. B. (1964). "MSR Program Semiannual Progress Report for the period ending July 31, 1964" (PDF). (ORNL-3708) (66.3 MB PDF), Oak Ridge National Laboratory, U.S. AEC (published November 1964). Retrieved 2008-05-21. {{cite journal}}: Cite journal requires |journal= (help)
+
+== Further reading ==
+MSRE Safety analysis
+
+== External links ==
+"The Molten-Salt Reactor Experiment" (1969) Oak Ridge National Laboratory on YouTube, a film published by Atomic Energy Commission
+Alvin Weinberg's Molten Salt Reactor Experiment on YouTube
+An Account of Oak Ridge National Laboratory’s Thirteen Nuclear Reactors (from ORNL; includes a section on the MSRE)
+2015 Workshop on Molten Salt Reactor Technologies ("Commemorating the 50th Anniversary of the Startup of the MSRE"), including a 50th anniversary brochure, posters, and a history of the ORNL molten salt program
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Nucleocosmochronology-0.md b/data/en.wikipedia.org/wiki/Nucleocosmochronology-0.md
new file mode 100644
index 000000000..d346839d4
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Nucleocosmochronology-0.md
@@ -0,0 +1,34 @@
+---
+title: "Nucleocosmochronology"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Nucleocosmochronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:37.317730+00:00"
+instance: "kb-cron"
+---
+
+Nucleocosmochronology, or nuclear cosmochronology, is a technique used to determine timescales for astrophysical objects and events based on observed ratios of radioactive heavy elements and their decay products. It is similar in many respects to radiometric dating, in which trace radioactive impurities were selectively incorporated into materials when they were formed.
+To calculate the age of formation of astronomical objects, the observed ratios of abundances of heavy radioactive and stable nuclides are compared to the primordial ratios predicted by nucleosynthesis theory. Both radioactive elements and their decay products matter, and some important elements include the long-lived radioactive nuclei Th-232, U-235, and U-238, all formed by the r-process. The process has been compared to radiocarbon dating. The age of the objects are determined by placing constraints on the duration of nucleosynthesis in the galaxy.
+Nucleocosmochronology has been employed to determine the age of the Sun (4.57±0.02 billion years) and of the Galactic thin disk (8.8±1.8 billion years), among other objects. It has also been used to estimate the age of the Milky Way itself by studying Cayrel's Star in the Galactic halo, which due to its low metallicity, is believed to have formed early in the history of the Galaxy.
+Limiting factors in its precision are the quality of observations of faint stars and the uncertainty of the primordial abundances of r-process elements.
+
+
+== History ==
+The first use of nuclear cosmochronology was in 1929, by Ernest Rutherford, who, shortly after the discovery that uranium has two naturally occurring radioactive isotopes with different half-lives, attempted to use the ratio to determine when the uranium had been produced. He suggested that both had been produced in equal abundances, assuming they had been produced in a single moment in time, and applied an argument based on incorrect assumptions about astrophysics to derive an incorrect age of about 6 billion years. He pioneered the idea that age could be calculated by the ratio of abundances of radioactive parent elements and their stable decay products.
+According to a tribute written by colleagues, a large part of the modern science of nuclear cosmochronology grew out of work by John Reynolds and his students. 
+Model-independent techniques were developed in 1970.
+
+
+== Technique ==
+It is necessarily to know the initial ratios by which nucleosynthesis produce radioactive parent elements in comparison to the stable elements they decay to, before decay occurs. These are the abundances which the elements would have if the radioactive parent elements were stable, and not producing daughter nuclei. The ratio of the abundance of radioactive elements to the abundance they would have if they were stable is called the remainder. Measurement of the current abundances of elements in objects, combined with nucleosynthesis theory, determines the remainders.
+
+
+== See also ==
+Astrochemistry
+Astronomical chronology
+Geochronology
+Gyrochronology
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Oneida_stirpiculture-0.md b/data/en.wikipedia.org/wiki/Oneida_stirpiculture-0.md
new file mode 100644
index 000000000..255ea7c37
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Oneida_stirpiculture-0.md
@@ -0,0 +1,28 @@
+---
+title: "Oneida stirpiculture"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Oneida_stirpiculture"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:43.771376+00:00"
+instance: "kb-cron"
+---
+
+The stirpiculture experiment at the Oneida Community was the first positive eugenics experiment in American history, resulting in the planned conception, birth and rearing of 58 children. The experiment lasted from 1869–1879. It was not considered as part of the larger eugenics history because of its radical religious context. The term "stirpiculture" was used by John Humphrey Noyes, founder of the Oneida Community, to refer to his system of eugenics, or the breeding of humans to achieve desired perfections within the species. Noyes derived stirpiculture from the Latin word "stirps", which means "stock, stem, or root" (Carden). It has been claimed that Noyes coined the term two decades before Francis Galton created the term "eugenics". In 1904, Galton claimed that he had first come up with the term and "deliberately changed it for eugenics," a claim supported in print by George Willis Cooke. In his 1883 book Inquiries into Human Faculty and Its Development, Galton noted that his new term "eugenics" was a suitable replacement for the older term "viriculture" that he had invented, suggesting that he had confused the two terms "viriculture" and "stirpiculture."
+
+== Origins of the Oneida stirpiculture experiment ==
+Until the late 1860s, John Humphrey Noyes and his community prevented the unintentional conception of children through their practice of male continence (a type of coitus reservatus). Instead, Noyes and the community believed in only having children with purpose and preparation. In this communal society, it was not simply about the preparedness of the parents, but rather the preparedness of the community to support a new generation. "A mistake was considered a serious detriment to the society" (Kinsley 13). In the early years of the community, when poverty was an issue, the community did not feel adequately prepared to take on the raising and support of children. Therefore, procreation was discouraged in these early days before the financial successes of the community's trap-building manufacturing. An "accidental" conception was thought to be a failure in male continence, the act that was meant to prevent unwanted pregnancies through the withholding of male ejaculation during intercourse. However, accidental conceptions did occur.
+Noyes developed the stirpiculture experiment through his reading and interpretations of Plato, Charles Darwin, Francis Galton and agricultural breeders. Noyes had begun to read Darwin's Principles of Breeding and Sir Francis Galton's papers and books on subjects ranging from anthropology, meteorology, horticulture, and eugenics (Circular, Vol II, No. 3, March 27, 1865). Intrigued by these readings, Noyes expanded upon these ideas and considered the potential benefits in the use of scientific propagation to create humans through intentional reproduction rather than haphazard sex.
+
+== The experiment ==
+In 1869, the Oneida Community began its experiment with stirpiculture, which Noyes governed in tandem with a committee. Community men and women were paired owing to their exhibition of superior mental and spiritual qualities. The Circular, a newspaper run by the Oneida Community for the Community, printed several articles outlining Noyes' idea of what the Oneida Community should strive to achieve in its experiment: all of the qualities of Christianity's patriarchs' (Abraham's obedience, Jesus as the Son of God).
+
+=== Participants ===
+Noyes was the main judge of the men and women selected to parent children in the experiment, but he also sought the aid of a committee. This committee approved and denied requests of community members to have a child. Many members applied as couples, and some of the couples were actually encouraged by the committee itself. There was a set of standards by which each candidate should meet; older men in the Community were especially sought after according to the community's idea of Ascending Fellowship, as Noyes believed they were much wiser and spiritually sound. Women, on the other hand, were typically between the ages of 20 and 42. Both men and women were chosen based on spiritual and virtuous qualities, as opposed to physical ones. Each potential parent was required to sign a contract committing themselves to the experiment, and most importantly to God and his human representative Noyes (Carden 62). Most important in these pledges were the promises to avoid any "personal feelings in regard to child-bearing" because it was believed that this quality would help them to better serve the experiment and most importantly, the Community.
+
+=== Raising the children ===
+Children at Oneida were raised communally, not specifically by their biological parents. They were brought up under the supervision of community "Mothers" and "Fathers" who were assigned the job of childcare in a separate wing of the Oneida Community's Mansion House. Many community members also assisted with childcare.
+The children were raised with access to the countryside and good nutrition, and Oneida was isolated from chronic diseases that might have affected children in more crowded areas. As the children grew, their families and friends encouraged them to go to college and to achieve worldly success. If they decided to attend college, they would board with The New Haven Family sect of the community. In part, this push toward outside education, especially scientific education, would contribute to the breakup of the Oneida Community.
+
+==== The first 15 months ====
+Once a child was born, they stayed with their mother for the first 15 months of life. During this period the mother was allowed and even encouraged to breastfeed the child. Breastfeeding was one of the only instances in which a strong attachment between mother and child was encouraged. This was due to its ability to encompass both scientific and natural views of life.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Oneida_stirpiculture-1.md b/data/en.wikipedia.org/wiki/Oneida_stirpiculture-1.md
new file mode 100644
index 000000000..712bcfd5a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Oneida_stirpiculture-1.md
@@ -0,0 +1,28 @@
+---
+title: "Oneida stirpiculture"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Oneida_stirpiculture"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:43.771376+00:00"
+instance: "kb-cron"
+---
+
+==== The Children's House ====
+Once weaned, children were sent to live in the Children's House. In the early days of the community, this "house" was a succession of rooms in the "Middle House". For some time after being weaned, children still slept with their mothers at night. Once they reached a certain age, they were discouraged from sleeping in their mothers' rooms. Still concerned with creating a bond between the child and the community, children would often sleep in the bed of a community member. This member changed periodically so no special attachments could be formed, and thus detract from the overall communal commitment.
+
+==== Values of non-attachment ====
+Guidelines were established by the community to help direct parents in establishing an appropriate relationship with their child. Most of these guidelines were an extension of the principles of non-attachment and commitment to the communal ideal. The concern was that an excessive relationship would fail to appropriately teach the child the communal fundamentals of the community. It was acceptable to be attached, as long as this was a general emotion of love and trust to the community, rather than to a particular individual. A mother's excessive attachment to her child was seen as a potential cause for illness or suffering on the child's part. In cases like this, it was often prescribed that the mother or child be temporarily moved to another community site for some time.
+
+=== Results ===
+The experiment with stirpiculture in the Oneida Community lasted from 1869 to 1879; 58 children were born as a result. Most men and women had only one child, but some had two or three, with 13 of these recorded as "accidental conceptions". To prove his religious and social prowess, as well as that of his bloodline, John H. Noyes and his son Theodore produced 12 children between them, 11 of whom survived. The Community was heavily invested in raising children to follow its ideals and guidelines, and values such as non-attachment were impressed upon children, even at a young age.
+Each child at Oneida was well supported and cared for within the community. They were given a lot of play time and rooms in which to do so, as the Oneidans believed in the importance of exercise. Both girls and boys were provided an education, and some of the children even went on to college, and were encouraged to do so. They were under the constant guidance of older community members. Theodore Noyes, son of John H. Noyes, kept detailed records of the growth and development of the children produced and raised in the stirpiculture experiment. Only one was reported to have physical disabilities. The children learned the importance of non-attachment and commitment to the community; however, it is apparent that some special relationships did occur. The experiment ended in 1879, as the community began to break up.
+
+== References ==
+
+== Sources ==
+Carden, Maren Lockwood. Oneida: Utopian Community to Modern Corporation. Baltimore: Johns Hopkins Press, 1969.
+Ellis, John B. Free Love and Its Votaries (American Socialism Unmasked). (Chapter 15- "The Juvenile Saints" pgs. 221-237). A.L. Bancroft & Co; San Francisco, California (1870).
+Kinsley, Jessie Catherine. A Lasting Spring. Edited by Jane Kinsley Rich. New York: Syracuse University Press, 1983.
+Youcha, Geraldine. "The Oneida Community." Minding the Children: Child Care in America from Colonial Times to the Present (2005): p. 110. Da Capo Press.
+Noyes, John Humphrey. "Stirpiculture" The Circular Vol. II, No. 3, April 3, 1865.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Operational_analytical_processing-0.md b/data/en.wikipedia.org/wiki/Operational_analytical_processing-0.md
new file mode 100644
index 000000000..72c0b6942
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Operational_analytical_processing-0.md
@@ -0,0 +1,36 @@
+---
+title: "Operational analytical processing"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Operational_analytical_processing"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:54:54.131392+00:00"
+instance: "kb-cron"
+---
+
+Operational analytical processing, more popularly known as operational analytics, is a subset of data analytics that focuses on improving the operational nature of a business or entity.
+The main characteristic that distinguishes operational analytics from other types of analytics is that it is analytics on the fly, which means that signals emanating from various parts of a business are processed in real-time to feed back into instant decision-making for the business. This is sometimes referred to as "continuous analytics," which is another way to emphasize the continuous digital feedback loop that can exist from one part of a business to its other parts.
+
+
+== Overview ==
+The rapid digital transformation of many businesses means that an increasing number of  business signals are being recorded and stored in digital form. Businesses are using these signals to improve their efficiency, improve their performance and provide better experiences to their users and customers. A Forrester Report details how digitization of a business is impacting its customer experiences by leveraging data.  Operational analytics allows you to process various types of information from different sources and then decide what to do next: what action to take, whom to talk to, what immediate plans to make. Gartner defines this as Continuous Intelligence in a research report and goes on to describe this as a design pattern in which real-time analytics are integrated within a business operation, processing current and historical data to prescribe actions in response to events. Andreessen Horowitz describes this as ...more and more decisions are automated away altogether—think of Amazon continually updating prices for its products throughout the day. This form of analytics has become popular with the digitization trend in almost all industry verticals, because it is digitization that furnishes the data needed for operational decision-making.
+A few examples of operational analytics include... a product manager who looks at product-usage logs to determine which features of the product are liked by its users, which features slow them down, and which features are disliked by its users. The product manager can gather all these answers by querying data that records usage patterns from the product's user base; and he or she can immediately feed that information back to make the product better. Similarly, in the case of marketing analytic in the pre-digitized world, a marketing manager would organize a few focus groups, try out a few experiments based on their own creativity and then implement them. Depending on the results of experimentation, they would then decide what to do next. An experiment may take weeks or months. In the digitized world, there is the "marketing engineer," a person who is well-versed in using data systems. These marketing engineers can run multiple experiments at once, gather results from experiments in the form of data, terminate the ineffective experiments and nurture the ones that work, all through the use of data-based software systems. The more experiments they can run and the quicker the turnaround times of results, the better their effectiveness in marketing their product.
+An MIT Technology Review article describes how a ride-sharing application uses algorithms for real-time monitoring of traffic and trip times to balance demand and supply for ride sourcing—and to adjust fees accordingly and rapidly. The use of operations analytics is not confined to the field of information technology. Data from business intelligence, finance, science, weather, and even current events are combined and then analyze together to extract valuable insight from it, and this in turn, drives quick decision making in almost every conceivable use. A metrics collection system like Scuba is an operational analytics system because it is used extensively for interactive, ad hoc, analysis queries that run in under a second over live data.
+
+
+== Definition of an operational analytics processing engine ==
+The definition of an operational analytics processing engine (OPAP)  can be expressed in the form of the following six propositions:
+
+Complex queries: Support for queries like inner & outer joins, aggregations, sorting, relevance, etc.
+Low data latency: An update to any data record is visible in query results in under than a few seconds.
+Low query latency: A simple search query returns in under a few milliseconds.
+High query volume: Able to serve at least a few hundred concurrent queries per second.
+Live sync with data sources: Ability to keep itself in sync with various external sources without having to write external scripts. This can be done via change-data-capture of an external database, or by tailing streaming data sources.
+Mixed types: Allows values of different types in the same column. This is needed to be able to ingest new data without needing to manipulate them at write time.
+
+
+== System requirements ==
+Operational Analytics is a subset of the broader set of processes that characterizes OLAP (online analytical processing). As such, it inherits the large data sizes and complex queries that OLAP systems typically has to handle. However, the characteristics that uniquely identify operational analytics is the requirement for quick predictions based on most recent signals. This means that the data latency and query latency are very small. For example, operational analytics applied to real time business processes specify that data latency be zero. It also means that queries should be fast and finish at interactive speeds. Because these decisions are taken at a micro-level and are very personalized to each individual entity, operational analytics processing is characterized by how easy it is to deliver personalized recommendations using such a system.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-0.md b/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-0.md
new file mode 100644
index 000000000..ee070831b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-0.md
@@ -0,0 +1,336 @@
+---
+title: "Optically stimulated luminescence thermochronometry"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:38.553390+00:00"
+instance: "kb-cron"
+---
+
+Optically stimulated luminescence (OSL) thermochronometry is a dating method used to determine the time since quartz and/or feldspar began to store charge as it cools through the effective closure temperature. The closure temperature for quartz and Na-rich K-feldspar is 30-35 °C and 25 °C respectively. When quartz and feldspar are beneath the earth, they are hot. They cool when any geological process e.g. focused erosion causes their exhumation to the earth surface.  As they cool, they trap electron charges originating from within the crystal lattice. These charges are accommodated within crystallographic defects or vacancies in their crystal lattices as the mineral cools below the closure temperature.
+During detrapping of these electrons, luminescence is produced. The luminescence or light emission from the mineral is assumed to be proportional to the trapped electron charge population.  The age recorded in standard OSL method is determined by counting the number of these trapped charges in an OSL detection system. The OSL age is the cooling age of the quartz and/or feldspar. This cooling history is a record of the mineral's thermal history, which is used to reconstruct the geological event.
+The sub-Quaternary period (104 to 105 years) is the geological age where OSL is a favourable dating technique because of low closure temperature of quartz and feldspar used in this technique. The Quaternary period is marked by intense crustal erosion particularly within active mountain ranges, leading to high exhumation rate of crustal rocks and formation of sub-Quaternary sediments. Previous techniques (e.g. Apatite Fission Track, Zircon Fission Track, and (Uranium-Thorium)/ Helium dating) could not adequately track the geological age records particularly in the last ~300 thousand years.  OSL dating is currently the only dating method that has been successfully applied to understand the cooling ages of the geological events.
+
+== Theoretical concepts of electron trapping and detrapping for OSL measurement ==
+In natural environment, crystal lattices of quartz and/or feldspar are bombarded with radiation released from radiogenic source such as in -situ radioactive decay. As the crystals are irradiated, charges are stored up in their crystallographic defects. The charge trapping process involves atomic-scale ionic substitution of both electron and hole within the crystal lattices of quartz and feldspar. The electron diffusion happens in response to ionizing radiation as the minerals cools below their closure temperature.
+If quartz or feldspar gains are exposed to natural light source such as the sun, the trapped charges will be evicted in form of luminescence. This natural process is called bleaching. Any other process that could heat up the sample will also cause the trapped electrons to escape from the crystal lattice known as thermal bleaching. Optical bleaching of the mineral leads to eviction of trapped charges in the minerals, hence, careful sampling and handling must be followed to avoid using bleached sample for OSL thermochronometry. To artificially produce luminescence in the laboratory for luminescence study of the mineral, these two processes are adopted.
+
+== Kinetic or rate equations for trapping and detrapping processes ==
+A wide range of kinetic models have been developed to explain trapping and detrapping processes in quartz and feldspar crystals. Two of these models are particularly useful in determining the cooling histories of quartz or feldspar These models are known as the general order kinetic model and band tail model. The two models consider three major processes to characterize the mineral luminescence, which are: trapping process, thermal detrapping process and athermal detrapping process. Each of the processes are guided by different equations discussed below. These models are useful for the determining of cooling history of the mineral, which involves subtracting the differential sum of thermal detrapping and athermal detrapping from the trapping process (i.e. Trapping – (Thermal detrapping + Athermal detrapping).
+
+=== Rate equations ===
+
+=== Determination of cooling history from the kinetic equations ===
+By combining the four equations above, a single differential equation is developed to convert the luminescence into cooling rate. We have:
+
+  
+    
+      
+        
+          
+            
+              d
+              
+                
+                  
+                    n
+                    ~
+                  
+                
+              
+            
+            
+              d
+              t
+            
+          
+        
+        =
+        
+          
+            
+              D
+              
+                R
+              
+            
+            
+              D
+              
+                o
+              
+            
+          
+        
+        (
+        1
+        −
+        
+          
+            
+              n
+              ~
+            
+          
+        
+        
+          )
+          
+            α
+          
+        
+        −
+        s
+        
+          
+            
+              
+                n
+                ~
+              
+            
+          
+          
+            β
+          
+        
+        exp
+        ⁡
+        
+          (
+          
+            
+              
+                −
+                E
+              
+              
+                k
+                T
+              
+            
+          
+          )
+        
+        −
+        s
+        
+          
+            
+              
+                n
+                ~
+              
+            
+          
+          
+            β
+          
+        
+        exp
+        ⁡
+        
+          (
+          
+            −
+            
+              p
+              
+                ′
+                
+                  −
+                  
+                    
+                      
+                        1
+                        3
+                      
+                    
+                  
+                
+              
+            
+            
+              r
+              ′
+            
+          
+          )
+        
+      
+    
+    {\displaystyle {\frac {d{\tilde {n}}}{dt}}={\frac {D_{R}}{D_{o}}}(1-{\tilde {n}})^{\alpha }-s{\tilde {n}}^{\beta }\exp \left({\frac {-E}{kT}}\right)-s{\tilde {n}}^{\beta }\exp \left(-p'^{-{\tfrac {1}{3}}}r'\right)}
+  
+ for the general order kinetic model; and
+
+  
+    
+      
+        
+          
+            
+              d
+              
+                
+                  
+                    n
+                    ~
+                  
+                
+              
+            
+            
+              d
+              t
+            
+          
+        
+        =
+        
+          
+            
+              D
+              
+                R
+              
+            
+            
+              D
+              
+                o
+              
+            
+          
+        
+        (
+        1
+        −
+        
+          
+            
+              n
+              ~
+            
+          
+        
+        
+          )
+          
+            α
+          
+        
+        −
+        s
+        
+          
+            
+              
+                n
+                ~
+              
+            
+          
+          
+            β
+          
+        
+        exp
+        ⁡
+        
+          (
+          
+            
+              
+                −
+                (
+                
+                  E
+                  
+                    t
+                  
+                
+                −
+                
+                  E
+                  
+                    b
+                  
+                
+                )
+              
+              
+                k
+                T
+              
+            
+          
+          )
+        
+        −
+        s
+        
+          
+            
+              
+                n
+                ~
+              
+            
+          
+          
+            β
+          
+        
+        exp
+        ⁡
+        
+          (
+          
+            −
+            
+              p
+              
+                ′
+                
+                  −
+                  
+                    
+                      
+                        1
+                        3
+                      
+                    
+                  
+                
+              
+            
+            
+              r
+              ′
+            
+          
+          )
+        
+      
+    
+    {\displaystyle {\frac {d{\tilde {n}}}{dt}}={\frac {D_{R}}{D_{o}}}(1-{\tilde {n}})^{\alpha }-s{\tilde {n}}^{\beta }\exp \left({\frac {-(E_{t}-E_{b})}{kT}}\right)-s{\tilde {n}}^{\beta }\exp \left(-p'^{-{\tfrac {1}{3}}}r'\right)}
+  
+ for the band tail model.
+Any of the models can be used because the same series of laboratory experiments are followed for the estimation of all the parameters involved in the equations. The inversion of measured 
+  
+    
+      
+        
+          
+            
+              n
+              ~
+            
+          
+        
+      
+    
+    {\displaystyle {\tilde {n}}}
+  
+ for a range of temperature -time history or T-t path can be used to determine the cooling rate. Sufficient number of T-t paths conducted in the laboratory is used to build a probability density function, which will help to determine the most likely cooling histories undergone by the mineral.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-1.md b/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-1.md
new file mode 100644
index 000000000..93056d51b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-1.md
@@ -0,0 +1,167 @@
+---
+title: "Optically stimulated luminescence thermochronometry"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:38.553390+00:00"
+instance: "kb-cron"
+---
+
+== Sample preparation ==
+Bedrock samples from earth surface or boreholes are required earth materials for OSL dating. Minerals (quartz and/or feldspar) are usually separated from the rock or sediment samples under regulated laboratory lighting system similar to procedures used in archaeological OSL dating. The light source is usually a controlled red light condition to avoid luminescence signal resetting.[7] Crushing of sample are gently carried out to avoid generating heat that is strong enough to reset OSL signal in the minerals. Crushed samples are separated by means of a sieve to get fine-grained. A range of values varying from 90 – 125 microns, 100 – 200 microns and 180 – 212 microns can be used for OSL measurement. The selected grains are chemically treated with HCl to digest carbonates and with H2O2 to remove organic materials that can contaminate the sensitivity of OSL signal during measurement. Feldspar and quartz with densities of < 2.62 g cm−3  and < 2.68 g cm−3 respectively are separated from other heavier minerals by density separation. Inclusions of zircon, apatite and feldspar in quartz as well as alpha-particles irradiated grain edges that can contaminate OSL signal are removed by etching in hydrofluoric acid (HF).
+
+== OSL signal detection system ==
+OSL ages are commonly measured using an automated Ris
+  
+    
+      
+        ∅
+      
+    
+    {\displaystyle \varnothing }
+  
+Thermal Luminescence Reader (e.g. TL-DA-20). It contains an internal beta-source (e.g. 90Sr/90Y) with optical stimulation emitted through laser diodes (LEDs). The reader also has a detection filter for transmission of stimulated luminescence signals. During this measurement, the mineral grain (quartz or feldspar) is glued on a heater strip (stainless-steel discs) using adhesive (commonly silicone spray). The mineral grain is stimulated with the light source. This light is the series of light emitting diode. This bombardment stimulates the electrons, which are trapped and begin to recombine in the crystal. During this process, they give the OSL signal, which is collected or recorded in the ray sensitive photomultiplier tube. The photomultiplier tube converts all the incident photons (i.e. light) to electronic charge. This is the basic principle of how the luminescence (light) emission from the minerals under investigation is measured.
+
+== OSL age determination ==
+
+To determine the OSL age of the sample, the dose rate, (
+  
+    
+      
+        
+          D
+          
+            R
+          
+        
+      
+    
+    {\displaystyle D_{R}}
+  
+) and the equivalent dose (
+  
+    
+      
+        
+          D
+          
+            E
+          
+        
+      
+    
+    {\displaystyle D_{E}}
+  
+). A dose is the quantity of natural radiation or energy absorbed by a mineral. The dose rate is the effective radiation absorbed from naturally occurring ionizing source per unit time.
+The age is calculated by determining the ratio of equivalent dose (
+  
+    
+      
+        
+          D
+          
+            E
+          
+        
+      
+    
+    {\displaystyle D_{E}}
+  
+) and the dose rate (
+  
+    
+      
+        
+          D
+          
+            R
+          
+        
+      
+    
+    {\displaystyle D_{R}}
+  
+) using the equation below.
+
+  
+    
+      
+        A
+        =
+        
+          
+            
+              D
+              
+                E
+              
+            
+            
+              D
+              
+                R
+              
+            
+          
+        
+      
+    
+    {\displaystyle A={\frac {D_{E}}{D_{R}}}}
+  
+
+where 
+  
+    
+      
+        A
+      
+    
+    {\displaystyle A}
+  
+ is the age (yr), 
+  
+    
+      
+        
+          D
+          
+            E
+          
+        
+      
+    
+    {\displaystyle D_{E}}
+  
+ is measured in Gray (Gy). Note that 1 Gy is equivalent to 1 J.kg−1 (Joule per kilogram) and 
+  
+    
+      
+        
+          D
+          
+            R
+          
+        
+      
+    
+    {\displaystyle D_{R}}
+  
+ is Gy year−1
+
+=== Dose rate determination ===
+For a single grain of mineral, the dose rate (
+  
+    
+      
+        
+          D
+          
+            R
+          
+        
+      
+    
+    {\displaystyle D_{R}}
+  
+) can be determined by measuring the concentrations of uranium, potassium and thorium by direct mass spectrometric analysis of quartz or feldspar grains. Ge-Gamma, INAA, X-ray flourescnce and ICP-MS or ICP-OES are spectrometers that can could be used. Other methods for the determination the dose rate include: (1) overburden cosmic dose rate estimation, (2) water content attenuation method, and (3) disequilibrium dose rate correction method. An average dose rate is usually calculated as representative of the dose rate.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-2.md b/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-2.md
new file mode 100644
index 000000000..a7632847f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry-2.md
@@ -0,0 +1,79 @@
+---
+title: "Optically stimulated luminescence thermochronometry"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Optically_stimulated_luminescence_thermochronometry"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:38.553390+00:00"
+instance: "kb-cron"
+---
+
+=== Equivalent dose determination ===
+The equivalent dose (
+  
+    
+      
+        
+          D
+          
+            E
+          
+        
+      
+    
+    {\displaystyle D_{E}}
+  
+) is also known as the dose response is determined from the dose response curve (see Plot B). The single-aliquot regenerative (SAR) protocol is a commonly used method for the determination of the equivalent dose. The protocol involves series of laboratory measurement of OSL signal (see Plot A), which is emitted by the aliquot after it has been optically stimulated at a known beta dose within a given time in seconds. The beta-source may be 90Sr/90Y in an automated Ris
+  
+    
+      
+        ∅
+      
+    
+    {\displaystyle \varnothing }
+  
+Thermal Luminescence Reader . During SAR protocol, the difference in the measurement for quartz and feldspar is mainly on the degree of heat required per time and the source of stimulation.
+The first stage involves determination of the natural dose (see Plot B) preheating the aliquot to about 160 -130 °C (for feldspar) for 10 s or 160-300 °C (for quartz) when the natural luminescence signal (i.e. natural dose) is still intact. This is done to remove unstable signals in the mineral. After preheating, the aliquot is optically stimulated by Infrared light emitting diode (for feldspar) or Blue light emitting diode (for quartz) depending on which mineral (see OSL detection system) for 40 s at 125 °C (for feldspar) or 100 s at 125 °C (for quartz) and the natural OSL signal (NL) is measured and recorded in the photomultiplier tube. For the second stage, the aliquot is irradiated with a fixed known test dose (beta dose). The aliquot is preheated at temperature less than 160 °C. The IRSL signal measurement is taken as a test dose IRSL response (NT) after it has been optically stimulated for 40 s at 125 °C (for feldspar) or 100 s at 125 °C (for quartz). At this stage the aliquot is completely bleached. A regenerative test dose is then started after bleaching.
+The same procedure as described above is followed but a range of regenerative dose is given at different temperature for sensitivity correction of OSL signal (See Plot B). For the regenerative dose measurement, the aliquot is irradiated with a known dose before preheating at 160-130 °C for 10 s or 160-300 °C for feldspar or quartz respectively while the signal response (Ri) is measured. A fixed test dose is by irradiating the aliquot and a preheating of the aliquot is carried out at a temperature less than 160 °C. The aliquot is optically stimulated at the same rate and the IRSL signal (RT) is measured. The steps are repeated for range of different regenerative dose including zero test dose. During each of the tests, all OSL signals are recorded in the photomultiplier tube and the OSL counts are plotted against the OSL exposure time in seconds as shown in the OSL signal curve (first graph).
+For sensitivity correction, NL is plotted against NT representing the natural OSL signal while the plot of Ri against RT representing regenerative dose test (see Plot B). The natural dose is along the vertical axis because no laboratory dose is given at the stage. The regenerative dose measurement will vary with respect to the given dose at each stage. The equivalent dose (
+  
+    
+      
+        
+          D
+          
+            E
+          
+        
+      
+    
+    {\displaystyle D_{E}}
+  
+) is determined by drawing a line (red discontinuous line in Plot B) from the natural dose to intercept with the regenerative dose curve. The point of interception with the curve represent the equivalent dose by reading its value on the horizontal axis (See Plot B). The corresponding dose value at the horizontal axis is recorded for the equivalent dose (
+  
+    
+      
+        
+          D
+          
+            E
+          
+        
+      
+    
+    {\displaystyle D_{E}}
+  
+).
+
+== Applications. ==
+
+=== General applications ===
+OSL finds application in all low-temperature (<50 °C) tectonic and sedimentary processes. These studies are mainly captured within the sub-Quaternary period including, but not limited to focused fluvial and/or glacial erosion, rock exhumation and evolution of topography in active tectonic regions. Other applications include glaciation deposits, lagoon deposits, storm surge and tsunami deposits, lake deposits including shoreline migration history, fluvial erosion deposits, loess deposit records. For example, the rate of slip on a normal faults plane can also be modelled, the rate of glacial or fluvial erosion of the earth surface can also be modelled as well as when sedimentary deposits are found within the sub-Quaternary period.
+In active tectonics regions, the application of OSL dating is very useful in tracking the thermal history and rate of rock exhumation towards the Earth's surface. The closer the cooling ages, the higher the rate of erosion and/or exhumation of the rock unit under investigation. When the OSL age of quartz or feldspar is known, the obtained ages are coupled with the existing thermal-mechanical equations e.g. Pecube to reconstruct the thermal-mechanical history.
+The OSL ages (see diagram), cooling ages, elevation data are plotted against the horizontal distance where samples and elevation data were collected to interpret the exhumation rate of rock or the evolution of the relief system through time. For example, OSL dating has been applied in determined the cooling histories of some rapidly eroding active regions at sub-Quaternary time-scale (i.e. 104 to 105 years). These examples are Whataroa-Perth catchment area in the Southern Alps of New Zealand and Namche Barwa-Gyala Peri dome in eastern Himalaya. In the Namche Barwa-Gyala Peri dome, river erosion was prevalent while glacial erosion was the main active process in the Whataroa-Perth catchment area. In both studies, the rate of exhumation and evolution of the relief systems were estimated by inversion of OSL thermochronological ages.
+
+== See also ==
+Luminescence dating
+Optically stimulated luminescence (physics)
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Orbital_tuning-0.md b/data/en.wikipedia.org/wiki/Orbital_tuning-0.md
new file mode 100644
index 000000000..4c49e370b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Orbital_tuning-0.md
@@ -0,0 +1,27 @@
+---
+title: "Orbital tuning"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Orbital_tuning"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:39.748346+00:00"
+instance: "kb-cron"
+---
+
+Orbital tuning refers to the process of adjusting the time scale of a paleoenvironmental proxy record so that the observed fluctuations correspond to the Milankovitch cycles in the Earth's orbital motion. This is typically done to correct for dating uncertainty or in cases where dating is not possible, such as beyond the range of radiocarbon dating. 
+
+
+== Description ==
+Changes in the Earth's orbit affect the amount and distribution of sunlight the Earth and certain parts of the Earth receives. Such changes are expected to introduce periodic climate changes on a time scale of 20-100 kyr. Long records of sedimentation or climate should record such variations. However, such records often have poorly constrained age scales. As a result, scientists will sometimes adjust the timing of the features in their samples to match the predictions of orbital theory in the hopes of improving the accuracy of their data. 
+
+
+== Methods and uses ==
+Orbital tuning is done by adjusting the timescale of a paleoclimate record to match variations in an insolated record. These small adjustments synchronize the paleoclimate proxy record to that of orbital cycles. If they are not, scientists are able to adjust one or more points to have these curves better correlate. At long timescales, orbitally forced changes in insolation are known to have a strong signal on climates and ecosystems, so orbital tuning is often an attempt to align proxy records with a known driver (insolation).
+Orbital tuning may also be used in cases where changes in sedimentation rate or preservation may cause gaps or hiatuses in a record that may complicate the interpretation of proxy records. For example, disturbances, ecological changes, varying precipitation levels, and other processes can cause shifts in sedimentation or a loss of sediments, and orbital tuning has been used to improve sediment chronologies or recapture missing portions of sediment records that have been lost or affected.  Orbital tuning is often used as a countermeasure to effects such as the mixing of top layer sediments by biotic interactions and/or other disturbances to samples. Methods have been developed to support results adjusted by orbital tuning such as radiometric data and more. Orbital tuning can be done to a whole sample but can also be done in short segments. Using it in short segments can greatly reduce the risk of manipulating the data.  
+
+
+== Criticism ==
+Criticisms have been raised against orbital tuning and often this tool needs multiple factors to validate its conclusions. When tuning variations in sediment deposit rates are not always because of orbital signals. Orbital tuning can often get these effects attributed to them. Due to this orbital tuning is used as needed over shorter time spans to not produce "overtuning" of a sample. Overturning refers to when a specific record uses too much orbital tuning and all of the data shown supports synchronous changes because it was tuned to match that specific time scale.  However, "overtuning" can result in apparent features that have no basis in the real data, such as occurred with the original SPECMAP record.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Ordered_key–value_store-0.md b/data/en.wikipedia.org/wiki/Ordered_key–value_store-0.md
new file mode 100644
index 000000000..cc7e30a64
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Ordered_key–value_store-0.md
@@ -0,0 +1,71 @@
+---
+title: "Ordered key–value store"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Ordered_key–value_store"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:54:55.294937+00:00"
+instance: "kb-cron"
+---
+
+An ordered key–value store (OKVS) is a type of data storage paradigm that can support multi-model databases. An OKVS is an ordered mapping of bytes to bytes. An OKVS will keep the key–value pairs sorted by the key lexicographic order. OKVS systems provides different set of features and performance trade-offs. Most of them are shipped as a library without network interfaces, in order to be embedded in another process. Most OKVS support ACID guarantees. Some OKVS are distributed databases. Ordered key–value stores found their way into many modern database systems including NewSQL database systems.
+
+
+== History ==
+The origin of ordered key–value store stems from the work of Ken Thompson on dbm in 1979. Later in 1991, Berkeley DB was released that featured a B-Tree backend that allowed the keys to stay sorted. Berkeley DB was said to be very fast and made its way into various commercial product. It was included in Python standard library until 2.7.  In 2009, Tokyo Cabinet was released that was superseded by Kyoto Cabinet that support both transaction and ordered keys. In 2011, LMDB was created to replace Berkeley DB in OpenLDAP. There is also Google's LevelDB that was forked by Facebook in 2012 as RocksDB. In 2014, WiredTiger, successor of Berkeley DB was acquired by MongoDB and is since 2019 the primary backend of MongoDB database.
+Other notable implementation of the OKVS paradigm are Sophia and SQLite3 LSM extension. Another notable use of OKVS paradigm is the multi-model database system called ArangoDB based on RocksDB.
+Some NewSQL databases are supported by ordered key–value stores. JanusGraph, a property graph database, has both a Berkeley DB backend and FoundationDB backend.
+
+
+== Key concepts ==
+
+
+=== Lexicographic encoding ===
+There are algorithms that encode basic data types (boolean, string, number) and composition of those data types inside sorted containers (tuple, list, vector) that preserve their natural ordering. It is possible to work with an ordered key–value store without having to work directly with bytes. In FoundationDB, it is called the tuple layer.
+
+
+=== Range query ===
+Inside an OKVS, keys are ordered, and because of that it is possible to do range queries. A range query retrieves all keys between two specified keys, ensuring that the fetched keys are returned in a sorted order.
+
+
+=== Subspaces ===
+
+
+=== Key composition ===
+One can construct key spaces to build higher level abstractions. The idea is to construct keys, that takes advantage of the ordered nature of the top level key space. When taking advantage of the ordered nature of the key space, one can query ranges of keys that have particular pattern.
+
+
+=== Denormalization ===
+Denormalization, as in, repeating the same piece of data in multiple subspace is common practice. It allows to create secondary representation, also called indices, that will allow to speed up queries.
+
+
+== Higher level abstractions ==
+The following abstraction or databases were built on top ordered key–value stores:
+
+Timeseries database,
+Record Database, also known as Row store databases, they behave similarly to what is dubbed RDBMS,
+Tuple Stores, also known as Triple Store or Quad Store but also Generic Tuple Store,
+Document database, that mimics MongoDB API,
+Full-text search
+Geographic Information Systems
+Property Graph
+Versioned Data
+Vector space database for Approximate Nearest Neighbor
+All those abstraction can co-exist with the same OKVS database and when ACID is supported, the operations happens with the guarantees offered by the transaction system.
+
+
+== Feature matrix ==
+
+
+== Use-cases ==
+OKVS are useful to implement two strategies: optimize a small feature e.g. to make a 10% improvement in read or write latency; the second strategy is to take advantage of the distributed nature of FoundationDB, and TiKV, for which there is no equivalent at very large scale in resilience. Both users need to re-implement the needed high level abstractions, because there are no portable ready-to-use libraries of high-level abstraction. There is still a complex balance, of complexity, maintainability, fine-tuning, and readily available features that makes it still a choice of experts. Sometime more specialized data-structures can be faster than a high-level abstraction on top of an OKVS.
+Another interest of OKVS paradigm stems from it simple, and versatile interface, that makes it an interesting target for experimental storage algorithms, and data structures.
+
+
+== See also ==
+Key–value database
+Wide-column store
+Multi-model database
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment-0.md b/data/en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment-0.md
new file mode 100644
index 000000000..5aeaf825e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment-0.md
@@ -0,0 +1,48 @@
+---
+title: "Organic Moderated Reactor Experiment"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:44.977297+00:00"
+instance: "kb-cron"
+---
+
+The Organic Moderated Reactor Experiment (OMRE) was a 16 MWt experimental organic nuclear reactor that operated at the National Reactor Testing Station from 1957 to 1963 to explore the use of hydrocarbons as coolant, moderator, and reflector materials in power reactor conditions. Such organic fluids are non-corrosive, do not become highly activated under irradiation, and can operate at low pressure and moderate temperature. These characteristics were considered promising towards the goal of achieving economical commercial nuclear power.
+The information provided by OMRE established the credibility of the Organic nuclear reactor concept and led to the commercial demonstration at the Piqua Nuclear Generating Station. More recently, OMRE has been cited as providing key input and motivation for modern designs of such systems, aiming to help improve performance of new and advanced nuclear power plants towards the goals of climate change mitigation.
+
+== Design ==
+
+The OMRE design efforts began in July 1955. It was originally intended to operate for 1 year.
+The objectives of the OMRE program were to obtain the following experimental information:
+
+Rate of radiation and thermal neutron damage to the hydrocarbon in the reactor
+Effect of this damage upon the operation of the reactor
+Suitable methods for ensuring satisfactory reactor operation in the presence of damaged hydrocarbon
+It was neither a pilot plant nor a prototype, but rather a minimum-cost experimental facility designed to investigate the feasibility of the organic concept to power reactors. It did not have an electric power conversion system.
+OMRE was designed to provide operational information on the response of diphenyl to high nuclear radiation and thermal neutron flux, with flexibility to test other polyphenyls such as terphenyl.
+The design criteria stated included:
+
+Maximum fuel surface temperature between 750 °F (400 °C) and 800 °F (430 °C)
+Bulk coolant temperature between 500 °F (260 °C) and 700 °F (370 °C)
+Coolant velocity in fuel plates up to 15 ft/s (4.6 m/s)
+Heat rejection capacity of 16 MWt
+25 fuel elements representing a total of 20.6 kg U235
+Fuel burnup of 11.2% U235
+Average thermal neutron flux in fuel of 2 1013 n/cm2/s at 500 °F (260 °C)
+Reactor system pressure of 300 psi (21 bar)
+The fuel element was a stainless steel box in which 16 active fuel plates were held in longitudinal grooves. Each fuel plate consisted of a core of highly enriched uranium particles uniformly dispersed in a stainless-steel matrix, clad with 304 stainless steel and rolled into a 0.030 in (0.76 mm) thick, 2.760 in (70 mm) wide and 37 in (940 mm) fuel plate. The dimensions of the rectangular reactor core were 57 centimeters by 69 centimeters by 91 centimeters.
+The reactor vessel was filled with diphenyl to obtain 14 feet of radiation shielding above the reactor core at 250 °F (120 °C). It was pressurized up to 300 psi with the inert nitrogen pressurized to 200 psi (14 bar) to prevent boiling of the hydrocarbon. The nitrogen was continuously purged from the system to sweep out any hydrogen and light hydrocarbon gases, like methane or ethane, produced by the decomposition of the coolant-moderator due to pyrolysis and radiolysis and discharge it out the stack.
+Coolant was pumped at 7,200 US gal/min (27 cubic metres per minute) through an air-blast heat exchanger to dump the core heat to the atmosphere. A steam system and power conversion system were not used to simplify the construction and operation of the reactor experiment.
+At high temperature and under irradiation, the hydrocarbons decompose and form longer chains with increasing molecular weight. This gradually degrades the heat transfer and flow characteristics of the fluid. To mitigate this, a coolant-moderator purification ran continuously to remove any hydrocarbons that had been damaged by heat or radiation. This was accomplished with a low-pressure distillation system.
+All systems were constructed with carbon steel, except the reactor vessel. All systems had heaters (including induction heating, resistance heating, and an oil-fired heater on the air-blast heat exchangers) to bring the system above the melting temperature of the coolant-moderator.
+
+== Construction ==
+
+Construction of OMRE began on June 17, 1956, and completed in May 1957. The reactor containment was partially built underground and consisted of a concrete pad and corrugated steel cylinder surrounded by compacted earth for radiation shielding.
+Clearing, grading, roads, walks, drainage, water supply, power substation, sanitary and process waste systems, fencing, security lighting, guard station, communications system, control and processing building, and reactor foundation excavation were performed in Phase I of the construction by the Idaho Operations Office and the Atomic Energy Commission. Some delays were encountered due to appropriations delays and a steel strike.
+The biggest setback was unsatisfactory performance of the control-rod drive mechanism. During testing, it became apparent that the original design would not work, and a new approach was needed.
+Process piping was constructed of Schedule 40 carbon steel.
+The buildings and utilities were constructed by Wadsworth & Arrington.
+
+== Operation ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment-1.md b/data/en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment-1.md
new file mode 100644
index 000000000..44e5538d8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment-1.md
@@ -0,0 +1,27 @@
+---
+title: "Organic Moderated Reactor Experiment"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Organic_Moderated_Reactor_Experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:44.977297+00:00"
+instance: "kb-cron"
+---
+
+The OMRE first achieved criticality on September 17, 1957, and reached full power at the beginning of February, 1958. The reactor operated in two modes: without the purification system, and with the purification system. Seventeen tests were run with the first OMRE core throughout 1958 with reactor power between 0 and 12 MWt.
+The first three tests were system check-out tests, covering all major systems. Subsequent tests simulated the conditions expected to be encountered in the Piqua Nuclear Generating Station. Test 4 demonstrated that pyrolitic decomposition rate in external piping was negligible. Tests 5-11 measured the decomposition rate and the effect of radiation damage on coolant-moderator heat-transfer characteristics. Tests 12 and 13 tested the purification system's ability to reduce the concentration of inorganic particulate matter while also reducing the high-boiler concentration from 40% to 8%.
+Three fuel element failures occurred during first core operation. Two occurred in experimental low-enriched assemblies with finned aluminum cladding due to inadequate coolant filtration, and the third was caused by improper element seating.
+By the end of the first year, the core had generated 958 MW-day of energy and been in operation for 5,600 hours. An extended shutdown followed to replace the core.
+Problems with coolant purification complicated the operation of the OMRE reactor. The polymerization of the terphenyl coolant (Santowax OM, subsequently Santowax R) lead to fouling and blockage of coolant channels and to the installation of an on-line coolant purification system. These complications and the progress of the water-cooled nuclear reactor technology led to the decision of US Atomic Energy Commission to reduce the American organic nuclear reactor program on December 10, 1962, and ultimately to shutdown OMRE on June 30, 1963. The Experimental Organic Cooled Reactor (EOCR) was built next to OMRE in anticipation of further development of the concept.  During the final stages of its construction, EOCR was also placed in standby and never operated.
+
+== Decommissioning ==
+Immediately following final OMRE shutdown, the nuclear fuel and reactor vessel internals were removed, and the organic coolant Santowax R (a commercial name of a mixture of terphenyl and diphenyl isomers) was drained from all the systems and remained in this deactivated condition until 1977.
+The facility was eventually decontaminated and decommissioned between October 1977 and September 1979. The process was complicated by the existence of some remaining toxic and flammable Santowax-R and xylene, a neutron-activated radioactive vessel emitting 350 R/h, and asbestos insulation. Furthermore, due to insufficient neutron shielding being included in the design, "an extraordinary, unexpected amount of activated rock and soil was removed.
+The surface radiation of the excavation and backfill material was brought to 20 R/h or less, and the nuclide content of the backfill soil was brought below 0.5 pCi/g.
+The decommissioning effort was initially estimated in 1977 to cost $700,000 (equivalent to $3,700,000 in 2025) and take 2 years, and was completed on time and under budget, for a total cost of $500,000 (equivalent to $2,700,000 in 2025).
+
+== References ==
+
+== External links ==
+Organic Moderated Reactor Experiment (1958 documentary film)
+Organic cooled reactors: Five Fast Facts (2019 American Nuclear Society article)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Persistence_module-0.md b/data/en.wikipedia.org/wiki/Persistence_module-0.md
new file mode 100644
index 000000000..e31e1b3e9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Persistence_module-0.md
@@ -0,0 +1,726 @@
+---
+title: "Persistence module"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Persistence_module"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:54:56.529466+00:00"
+instance: "kb-cron"
+---
+
+A persistence module is a mathematical structure in persistent homology and topological data analysis that formally captures the persistence of topological features of an object across a range of scale parameters. A persistence module often consists of a collection of homology groups (or vector spaces if using field coefficients) corresponding to a filtration of topological spaces, and a collection of linear maps induced by the inclusions of the filtration. The concept of a persistence module was first introduced in 2005 as an application of graded modules over polynomial rings, thus importing well-developed algebraic ideas from classical commutative algebra theory to the setting of persistent homology. Since then, persistence modules have been one of the primary algebraic structures studied in the field of applied topology.
+
+== Definition ==
+
+=== Single Parameter Persistence Modules ===
+Let 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ be a totally ordered set and let 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+ be a field. The set 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ is sometimes called the indexing set. Then a single-parameter persistence module 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ is a functor 
+  
+    
+      
+        M
+        :
+        T
+        →
+        
+          
+            V
+            e
+            c
+          
+          
+            K
+          
+        
+      
+    
+    {\displaystyle M:T\to \mathbf {Vec} _{K}}
+  
+ from the poset category of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ to the category of vector spaces over 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+ and linear maps. A single-parameter persistence module indexed by a discrete poset such as the integers can be represented intuitively as a diagram of spaces: 
+  
+    
+      
+        ⋯
+        →
+        
+          M
+          
+            −
+            1
+          
+        
+        →
+        
+          M
+          
+            0
+          
+        
+        →
+        
+          M
+          
+            1
+          
+        
+        →
+        
+          M
+          
+            2
+          
+        
+        →
+        ⋯
+      
+    
+    {\displaystyle \cdots \to M_{-1}\to M_{0}\to M_{1}\to M_{2}\to \cdots }
+  
+To emphasize the indexing set being used, a persistence module indexed by 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ is sometimes called a 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+-persistence module, or simply a 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+-module. Common choices of indexing sets include 
+  
+    
+      
+        
+          R
+        
+        ,
+        
+          Z
+        
+        ,
+        
+          N
+        
+      
+    
+    {\displaystyle \mathbb {R} ,\mathbb {Z} ,\mathbb {N} }
+  
+, etc.
+One can alternatively use a set-theoretic definition of a persistence module that is equivalent to the categorical viewpoint: A persistence module is a pair 
+  
+    
+      
+        (
+        V
+        ,
+        π
+        )
+      
+    
+    {\displaystyle (V,\pi )}
+  
+ where 
+  
+    
+      
+        V
+      
+    
+    {\displaystyle V}
+  
+ is a collection 
+  
+    
+      
+        {
+        
+          V
+          
+            z
+          
+        
+        
+          }
+          
+            z
+            ∈
+            T
+          
+        
+      
+    
+    {\displaystyle \{V_{z}\}_{z\in T}}
+  
+ of 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+-vector spaces  and 
+  
+    
+      
+        π
+      
+    
+    {\displaystyle \pi }
+  
+ is a collection 
+  
+    
+      
+        {
+        
+          π
+          
+            y
+            ,
+            z
+          
+        
+        
+          }
+          
+            y
+            ≤
+            z
+            ∈
+            T
+          
+        
+      
+    
+    {\displaystyle \{\pi _{y,z}\}_{y\leq z\in T}}
+  
+ of linear maps where 
+  
+    
+      
+        
+          π
+          
+            y
+            ,
+            z
+          
+        
+        :
+        
+          V
+          
+            y
+          
+        
+        →
+        
+          V
+          
+            z
+          
+        
+      
+    
+    {\displaystyle \pi _{y,z}:V_{y}\to V_{z}}
+  
+ for each 
+  
+    
+      
+        y
+        ≤
+        z
+        ∈
+        T
+      
+    
+    {\displaystyle y\leq z\in T}
+  
+, such that 
+  
+    
+      
+        
+          π
+          
+            y
+            ,
+            z
+          
+        
+        ∘
+        
+          π
+          
+            x
+            ,
+            y
+          
+        
+        =
+        
+          π
+          
+            x
+            ,
+            z
+          
+        
+      
+    
+    {\displaystyle \pi _{y,z}\circ \pi _{x,y}=\pi _{x,z}}
+  
+ for any 
+  
+    
+      
+        x
+        ≤
+        y
+        ≤
+        z
+        ∈
+        T
+      
+    
+    {\displaystyle x\leq y\leq z\in T}
+  
+ (i.e., all the maps commute).
+
+=== Multiparameter Persistence Modules ===
+Let 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+ be a product of 
+  
+    
+      
+        n
+      
+    
+    {\displaystyle n}
+  
+ totally ordered sets, i.e., 
+  
+    
+      
+        P
+        =
+        
+          T
+          
+            1
+          
+        
+        ×
+        ⋯
+        ×
+        
+          T
+          
+            n
+          
+        
+      
+    
+    {\displaystyle P=T_{1}\times \dots \times T_{n}}
+  
+ for some totally ordered sets 
+  
+    
+      
+        
+          T
+          
+            i
+          
+        
+      
+    
+    {\displaystyle T_{i}}
+  
+. Then by endowing 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+ with the product partial order given by 
+  
+    
+      
+        (
+        
+          s
+          
+            1
+          
+        
+        ,
+        …
+        ,
+        
+          s
+          
+            n
+          
+        
+        )
+        ≤
+        (
+        
+          t
+          
+            1
+          
+        
+        ,
+        …
+        ,
+        
+          t
+          
+            n
+          
+        
+        )
+      
+    
+    {\displaystyle (s_{1},\dots ,s_{n})\leq (t_{1},\dots ,t_{n})}
+  
+ only if 
+  
+    
+      
+        
+          s
+          
+            i
+          
+        
+        ≤
+        
+          t
+          
+            i
+          
+        
+      
+    
+    {\displaystyle s_{i}\leq t_{i}}
+  
+ for all 
+  
+    
+      
+        i
+        =
+        1
+        ,
+        …
+        ,
+        n
+      
+    
+    {\displaystyle i=1,\dots ,n}
+  
+, we can define a multiparameter persistence module indexed by 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+ as a functor 
+  
+    
+      
+        M
+        :
+        P
+        →
+        
+          
+            V
+            e
+            c
+          
+          
+            K
+          
+        
+      
+    
+    {\displaystyle M:P\to \mathbf {Vec} _{K}}
+  
+. This is a generalization of single-parameter persistence modules, and in particular, this agrees with the single-parameter definition when 
+  
+    
+      
+        n
+        =
+        1
+      
+    
+    {\displaystyle n=1}
+  
+.
+In this case, a 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+-persistence module is referred to as an 
+  
+    
+      
+        n
+      
+    
+    {\displaystyle n}
+  
+-dimensional or 
+  
+    
+      
+        n
+      
+    
+    {\displaystyle n}
+  
+-parameter persistence module, or simply a multiparameter or multidimensional module if the number of parameters is already clear from context.
+
+Multidimensional persistence modules were first introduced in 2009 by Carlsson and Zomorodian. Since then, there has been a significant amount of research into the theory and practice of working with multidimensional modules, since they provide more structure for studying the shape of data. Namely, multiparameter modules can have greater density sensitivity and robustness to outliers than single-parameter modules, making them a potentially useful tool for data analysis.
+One downside of multiparameter persistence is its inherent complexity. This makes performing computations related to multiparameter persistence modules difficult. In the worst case, the computational complexity of multidimensional persistent homology is exponential.
+The most common way to measure the similarity of two multiparameter persistence modules is using the interleaving distance, which is an extension of the bottleneck distance.
+
+== Examples ==
+
+=== Homology Modules ===
+When using homology with coefficients in a field, a homology group has the structure of a vector space. Therefore, given a filtration of spaces 
+  
+    
+      
+        F
+        :
+        P
+        →
+        
+          T
+          o
+          p
+        
+      
+    
+    {\displaystyle F:P\to \mathbf {Top} }
+  
+, by applying the homology functor at each index we obtain a persistence module 
+  
+    
+      
+        
+          H
+          
+            i
+          
+        
+        (
+        F
+        )
+        :
+        P
+        →
+        
+          
+            V
+            e
+            c
+          
+          
+            K
+          
+        
+      
+    
+    {\displaystyle H_{i}(F):P\to \mathbf {Vec} _{K}}
+  
+ for each 
+  
+    
+      
+        i
+        =
+        1
+        ,
+        2
+        ,
+        …
+      
+    
+    {\displaystyle i=1,2,\dots }
+  
+ called the (
+  
+    
+      
+        i
+      
+    
+    {\displaystyle i}
+  
+th-dimensional) homology module of 
+  
+    
+      
+        F
+      
+    
+    {\displaystyle F}
+  
+. The vector spaces of the homology module can be defined index-wise as 
+  
+    
+      
+        
+          H
+          
+            i
+          
+        
+        (
+        F
+        
+          )
+          
+            z
+          
+        
+        =
+        
+          H
+          
+            i
+          
+        
+        (
+        
+          F
+          
+            z
+          
+        
+        )
+      
+    
+    {\displaystyle H_{i}(F)_{z}=H_{i}(F_{z})}
+  
+ for all 
+  
+    
+      
+        z
+        ∈
+        P
+      
+    
+    {\displaystyle z\in P}
+  
+, and the linear maps are induced by the inclusion maps of 
+  
+    
+      
+        F
+      
+    
+    {\displaystyle F}
+  
+.
+Homology modules are the most ubiquitous examples of persistence modules, as they encode information about the number and scale of topological features of an object (usually derived from building a filtration on a point cloud) in a purely algebraic structure, thus making understanding the shape of the data amenable to algebraic techniques, imported from well-developed areas of mathematics such as commutative algebra and representation theory.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Persistence_module-1.md b/data/en.wikipedia.org/wiki/Persistence_module-1.md
new file mode 100644
index 000000000..45ffb0d8b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Persistence_module-1.md
@@ -0,0 +1,1002 @@
+---
+title: "Persistence module"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Persistence_module"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:54:56.529466+00:00"
+instance: "kb-cron"
+---
+
+=== Interval Modules ===
+A primary concern in the study of persistence modules is whether modules can be decomposed into "simpler pieces", roughly speaking. In particular, it is algebraically and computationally convenient if a persistence module can be expressed as a direct sum of smaller modules known as interval modules.
+Let 
+  
+    
+      
+        J
+      
+    
+    {\displaystyle J}
+  
+ be a nonempty subset of a poset 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+. Then 
+  
+    
+      
+        J
+      
+    
+    {\displaystyle J}
+  
+ is an interval in 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+ if
+
+For every 
+  
+    
+      
+        x
+        ,
+        z
+        ∈
+        J
+      
+    
+    {\displaystyle x,z\in J}
+  
+ if 
+  
+    
+      
+        x
+        ≤
+        y
+        ≤
+        z
+        ∈
+        P
+      
+    
+    {\displaystyle x\leq y\leq z\in P}
+  
+ then 
+  
+    
+      
+        y
+        ∈
+        J
+      
+    
+    {\displaystyle y\in J}
+  
+
+For every 
+  
+    
+      
+        x
+        ,
+        z
+        ∈
+        J
+      
+    
+    {\displaystyle x,z\in J}
+  
+ there is a sequence of elements 
+  
+    
+      
+        
+          p
+          
+            1
+          
+        
+        ,
+        
+          p
+          
+            2
+          
+        
+        ,
+        …
+        ,
+        
+          p
+          
+            n
+          
+        
+        ∈
+        J
+      
+    
+    {\displaystyle p_{1},p_{2},\dots ,p_{n}\in J}
+  
+ such that 
+  
+    
+      
+        
+          p
+          
+            1
+          
+        
+        =
+        x
+      
+    
+    {\displaystyle p_{1}=x}
+  
+, 
+  
+    
+      
+        
+          p
+          
+            n
+          
+        
+        =
+        z
+      
+    
+    {\displaystyle p_{n}=z}
+  
+, and 
+  
+    
+      
+        
+          p
+          
+            i
+          
+        
+        ,
+        
+          p
+          
+            j
+          
+        
+      
+    
+    {\displaystyle p_{i},p_{j}}
+  
+ are comparable for all 
+  
+    
+      
+        i
+        ,
+        j
+        ∈
+        {
+        1
+        ,
+        …
+        ,
+        n
+        }
+      
+    
+    {\displaystyle i,j\in \{1,\dots ,n\}}
+  
+.
+Now given an interval 
+  
+    
+      
+        J
+        ⊆
+        P
+      
+    
+    {\displaystyle J\subseteq P}
+  
+ we can define a persistence module 
+  
+    
+      
+        
+          
+            I
+          
+          
+            J
+          
+        
+      
+    
+    {\displaystyle \mathbb {I} ^{J}}
+  
+index-wise as follows:
+
+  
+    
+      
+        
+          
+            I
+          
+          
+            z
+          
+          
+            J
+          
+        
+        :=
+        
+          
+            {
+            
+              
+                
+                  K
+                
+                
+                  
+                    if 
+                  
+                  z
+                  ∈
+                  J
+                
+              
+              
+                
+                  0
+                
+                
+                  
+                    otherwise 
+                  
+                
+              
+            
+            
+          
+        
+      
+    
+    {\displaystyle \mathbb {I} _{z}^{J}:={\begin{cases}K&{\text{if }}z\in J\\0&{\text{otherwise }}\end{cases}}}
+  
+; 
+  
+    
+      
+        
+          
+            I
+          
+          
+            y
+            ,
+            z
+          
+          
+            J
+          
+        
+        :=
+        
+          
+            {
+            
+              
+                
+                  
+                    id
+                    
+                      K
+                    
+                  
+                
+                
+                  
+                    if 
+                  
+                  y
+                  ≤
+                  z
+                  ∈
+                  J
+                
+              
+              
+                
+                  0
+                
+                
+                  
+                    otherwise 
+                  
+                
+              
+            
+            
+          
+        
+      
+    
+    {\displaystyle \mathbb {I} _{y,z}^{J}:={\begin{cases}\operatorname {id} _{K}&{\text{if }}y\leq z\in J\\0&{\text{otherwise }}\end{cases}}}
+  
+.
+The module 
+  
+    
+      
+        
+          
+            I
+          
+          
+            J
+          
+        
+      
+    
+    {\displaystyle \mathbb {I} ^{J}}
+  
+ is called an interval module.
+
+=== Free Modules ===
+Let 
+  
+    
+      
+        a
+        ∈
+        P
+      
+    
+    {\displaystyle a\in P}
+  
+. Then we can define a persistence module 
+  
+    
+      
+        
+          Q
+          
+            a
+          
+        
+      
+    
+    {\displaystyle Q^{a}}
+  
+ with respect to 
+  
+    
+      
+        a
+      
+    
+    {\displaystyle a}
+  
+ where the spaces are given by
+
+  
+    
+      
+        
+          Q
+          
+            z
+          
+          
+            a
+          
+        
+        :=
+        
+          
+            {
+            
+              
+                
+                  K
+                
+                
+                  
+                    if 
+                  
+                  z
+                  ≥
+                  a
+                
+              
+              
+                
+                  0
+                
+                
+                  
+                    otherwise 
+                  
+                
+              
+            
+            
+          
+        
+      
+    
+    {\displaystyle Q_{z}^{a}:={\begin{cases}K&{\text{if }}z\geq a\\0&{\text{otherwise }}\end{cases}}}
+  
+, and the maps defined via 
+  
+    
+      
+        
+          Q
+          
+            y
+            ,
+            z
+          
+          
+            a
+          
+        
+        :=
+        
+          
+            {
+            
+              
+                
+                  
+                    id
+                    
+                      K
+                    
+                  
+                
+                
+                  
+                    if 
+                  
+                  z
+                  ≥
+                  a
+                
+              
+              
+                
+                  0
+                
+                
+                  
+                    otherwise 
+                  
+                
+              
+            
+            
+          
+        
+      
+    
+    {\displaystyle Q_{y,z}^{a}:={\begin{cases}\operatorname {id} _{K}&{\text{if }}z\geq a\\0&{\text{otherwise }}\end{cases}}}
+  
+.
+Then 
+  
+    
+      
+        
+          Q
+          
+            a
+          
+        
+      
+    
+    {\displaystyle Q^{a}}
+  
+ is known as a free (persistence) module.
+One can also define a free module in terms of decomposition into interval modules. For each 
+  
+    
+      
+        a
+        ∈
+        P
+      
+    
+    {\displaystyle a\in P}
+  
+ define the interval 
+  
+    
+      
+        
+          a
+          
+            ⌞
+          
+        
+        :=
+        {
+        b
+        ∈
+        P
+        ∣
+        b
+        ≥
+        a
+        }
+      
+    
+    {\displaystyle a^{\llcorner }:=\{b\in P\mid b\geq a\}}
+  
+, sometimes called a "free interval." Then a persistence module 
+  
+    
+      
+        F
+      
+    
+    {\displaystyle F}
+  
+ is a free module if there exists a multiset 
+  
+    
+      
+        
+          
+            J
+          
+        
+        (
+        F
+        )
+        ⊆
+        P
+      
+    
+    {\displaystyle {\mathfrak {J}}(F)\subseteq P}
+  
+ such that 
+  
+    
+      
+        F
+        =
+        
+          ⨁
+          
+            a
+            ∈
+            
+              
+                J
+              
+            
+            (
+            F
+            )
+          
+        
+        
+          
+            I
+          
+          
+            
+              a
+              
+                ⌞
+              
+            
+          
+        
+      
+    
+    {\displaystyle F=\bigoplus _{a\in {\mathfrak {J}}(F)}\mathbb {I} ^{a^{\llcorner }}}
+  
+. In other words, a module is a free module if it can be decomposed as a direct sum of free interval modules.
+
+== Properties ==
+
+=== Finite Type Conditions ===
+A persistence module 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ indexed over 
+  
+    
+      
+        
+          N
+        
+      
+    
+    {\displaystyle \mathbb {N} }
+  
+ is said to be of finite type if the following conditions hold for all 
+  
+    
+      
+        n
+        ∈
+        
+          N
+        
+      
+    
+    {\displaystyle n\in \mathbb {N} }
+  
+:
+
+Each vector space 
+  
+    
+      
+        
+          M
+          
+            n
+          
+        
+      
+    
+    {\displaystyle M_{n}}
+  
+ is finite-dimensional.
+There exists an integer 
+  
+    
+      
+        N
+      
+    
+    {\displaystyle N}
+  
+ such that the map 
+  
+    
+      
+        
+          M
+          
+            N
+            ,
+            n
+          
+        
+      
+    
+    {\displaystyle M_{N,n}}
+  
+ is an isomorphism for all 
+  
+    
+      
+        n
+        ≥
+        N
+      
+    
+    {\displaystyle n\geq N}
+  
+.
+If 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ satisfies the first condition, then 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ is commonly said to be pointwise finite-dimensional (p.f.d.). The notion of pointwise finite-dimensionality immediately extends to arbitrary indexing sets.
+The definition of finite type can also be adapted to continuous indexing sets. Namely, a module 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ indexed over 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+ is of finite type if 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ is p.f.d., and 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ contains a finite number of unique vector spaces. Formally speaking, this requires that for all but a finite number of points 
+  
+    
+      
+        x
+        ∈
+        
+          R
+        
+      
+    
+    {\displaystyle x\in \mathbb {R} }
+  
+ there is a neighborhood 
+  
+    
+      
+        N
+      
+    
+    {\displaystyle N}
+  
+ of 
+  
+    
+      
+        x
+      
+    
+    {\displaystyle x}
+  
+ such that 
+  
+    
+      
+        
+          M
+          
+            y
+          
+        
+        ≅
+        
+          M
+          
+            z
+          
+        
+      
+    
+    {\displaystyle M_{y}\cong M_{z}}
+  
+ for all 
+  
+    
+      
+        y
+        ,
+        z
+        ∈
+        N
+      
+    
+    {\displaystyle y,z\in N}
+  
+, and also that there is some 
+  
+    
+      
+        w
+        ∈
+        
+          R
+        
+      
+    
+    {\displaystyle w\in \mathbb {R} }
+  
+ such that 
+  
+    
+      
+        
+          M
+          
+            v
+          
+        
+        =
+        0
+      
+    
+    {\displaystyle M_{v}=0}
+  
+ for all 
+  
+    
+      
+        v
+        ≤
+        w
+      
+    
+    {\displaystyle v\leq w}
+  
+. A module satisfying only the former property is sometimes labeled essentially discrete, whereas a module satisfying both properties is known as essentially finite.
+An 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+-persistence module is said to be semicontinuous if for any 
+  
+    
+      
+        x
+        ∈
+        
+          R
+        
+      
+    
+    {\displaystyle x\in \mathbb {R} }
+  
+ and any 
+  
+    
+      
+        y
+        ≤
+        x
+      
+    
+    {\displaystyle y\leq x}
+  
+ sufficiently close to 
+  
+    
+      
+        x
+      
+    
+    {\displaystyle x}
+  
+, the map 
+  
+    
+      
+        
+          M
+          
+            y
+            ,
+            x
+          
+        
+        :
+        
+          M
+          
+            y
+          
+        
+        →
+        
+          M
+          
+            x
+          
+        
+      
+    
+    {\displaystyle M_{y,x}:M_{y}\to M_{x}}
+  
+ is an isomorphism. Note that this condition is redundant if the other finite type conditions above are satisfied, so it is not typically included in the definition, but is relevant in certain circumstances.
+
+=== Structure Theorem ===
+One of the primary goals in the study of persistence modules is to classify modules according to their decomposability into interval modules. A persistence module that admits a decomposition as a direct sum of interval modules is often simply called "interval decomposable." One of the primary results in this direction is that any p.f.d. persistence module indexed over a totally ordered set is interval decomposable. This is sometimes referred to as the "structure theorem for persistence modules."
+
+The case when 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+ is finite is a straightforward application of the structure theorem for finitely generated modules over a principal ideal domain. For modules indexed over 
+  
+    
+      
+        
+          Z
+        
+      
+    
+    {\displaystyle \mathbb {Z} }
+  
+, the first known proof of the structure theorem is due to Webb. The theorem was extended to the case of 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+ (or any totally ordered set containing a countable subset that is dense in 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+ with the order topology) by Crawley-Boevey in 2015. The generalized version of the structure theorem, i.e., for p.f.d. modules indexed over arbitrary totally ordered sets, was established by Botnan and Crawley-Boevey in 2019.
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Persistent_Betti_number-0.md b/data/en.wikipedia.org/wiki/Persistent_Betti_number-0.md
new file mode 100644
index 000000000..899eaa920
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Persistent_Betti_number-0.md
@@ -0,0 +1,489 @@
+---
+title: "Persistent Betti number"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Persistent_Betti_number"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:54:57.708149+00:00"
+instance: "kb-cron"
+---
+
+In persistent homology, a persistent Betti number is a multiscale analog of a Betti number that tracks the number of topological features that persist over multiple scale parameters in a filtration. Whereas the classical 
+  
+    
+      
+        
+          n
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle n^{th}}
+  
+ Betti number equals the rank of the 
+  
+    
+      
+        
+          n
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle n^{th}}
+  
+ homology group, the 
+  
+    
+      
+        
+          n
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle n^{th}}
+  
+ persistent Betti number is the rank of the 
+  
+    
+      
+        
+          n
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle n^{th}}
+  
+ persistent homology group. The concept of a persistent Betti number was introduced by Herbert Edelsbrunner, David Letscher, and Afra Zomorodian in the 2002 paper Topological Persistence and Simplification, one of the seminal papers in the field of persistent homology and topological data analysis. Applications of the persistent Betti number appear in a variety of fields including data analysis, machine learning, and physics.
+
+
+== Definition ==
+Let 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+ be a simplicial complex, and let 
+  
+    
+      
+        f
+        :
+        K
+        →
+        
+          R
+        
+      
+    
+    {\displaystyle f:K\to \mathbb {R} }
+  
+ be a monotonic, i.e., non-decreasing function. Requiring monotonicity guarantees that the sublevel set 
+  
+    
+      
+        K
+        (
+        a
+        )
+        :=
+        
+          f
+          
+            −
+            1
+          
+        
+        (
+        −
+        ∞
+        ,
+        a
+        ]
+      
+    
+    {\displaystyle K(a):=f^{-1}(-\infty ,a]}
+  
+ is a subcomplex of 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+ for all 
+  
+    
+      
+        a
+        ∈
+        
+          R
+        
+      
+    
+    {\displaystyle a\in \mathbb {R} }
+  
+. Letting the parameter 
+  
+    
+      
+        a
+      
+    
+    {\displaystyle a}
+  
+ vary, we can arrange these subcomplexes into a nested sequence 
+  
+    
+      
+        ∅
+        =
+        
+          K
+          
+            0
+          
+        
+        ⊆
+        
+          K
+          
+            1
+          
+        
+        ⊆
+        ⋯
+        ⊆
+        
+          K
+          
+            n
+          
+        
+        =
+        K
+      
+    
+    {\displaystyle \emptyset =K_{0}\subseteq K_{1}\subseteq \cdots \subseteq K_{n}=K}
+  
+ for some natural number 
+  
+    
+      
+        n
+      
+    
+    {\displaystyle n}
+  
+. This sequences defines a filtration on the complex 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+.
+Persistent homology concerns itself with the evolution of topological features across a filtration. To that end, by taking the 
+  
+    
+      
+        
+          p
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle p^{th}}
+  
+ homology group of every complex in the filtration we obtain a sequence of homology groups 
+  
+    
+      
+        0
+        =
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            0
+          
+        
+        )
+        →
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            1
+          
+        
+        )
+        →
+        ⋯
+        →
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            n
+          
+        
+        )
+        =
+        
+          H
+          
+            p
+          
+        
+        (
+        K
+        )
+      
+    
+    {\displaystyle 0=H_{p}(K_{0})\to H_{p}(K_{1})\to \cdots \to H_{p}(K_{n})=H_{p}(K)}
+  
+ that are connected by homomorphisms induced by the inclusion maps in the filtration. When applying homology over a field, we get a sequence of vector spaces and linear maps commonly known as a persistence module.
+In order to track the evolution of homological features as opposed to the static topological information at each individual index, one needs to count only the number of nontrivial homology classes that persist in the filtration, i.e., that remain nontrivial across multiple scale parameters. 
+For each 
+  
+    
+      
+        i
+        ≤
+        j
+      
+    
+    {\displaystyle i\leq j}
+  
+, let 
+  
+    
+      
+        
+          f
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+      
+    
+    {\displaystyle f_{p}^{i,j}}
+  
+ denote the induced homomorphism 
+  
+    
+      
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            i
+          
+        
+        )
+        →
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            j
+          
+        
+        )
+      
+    
+    {\displaystyle H_{p}(K_{i})\to H_{p}(K_{j})}
+  
+. Then the 
+  
+    
+      
+        
+          p
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle p^{th}}
+  
+ persistent homology groups are defined to be the images of each induced map. Namely, 
+  
+    
+      
+        
+          H
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+        :=
+        im
+        ⁡
+        
+          f
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+      
+    
+    {\displaystyle H_{p}^{i,j}:=\operatorname {im} f_{p}^{i,j}}
+  
+ for all 
+  
+    
+      
+        0
+        ≤
+        i
+        ≤
+        j
+        ≤
+        n
+      
+    
+    {\displaystyle 0\leq i\leq j\leq n}
+  
+.
+In parallel to the classical Betti number, the 
+  
+    
+      
+        
+          p
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle p^{th}}
+  
+ persistent Betti numbers are precisely the ranks of the 
+  
+    
+      
+        
+          p
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle p^{th}}
+  
+ persistent homology groups, given by the definition 
+  
+    
+      
+        
+          β
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+        :=
+        rank
+        ⁡
+        
+          H
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+      
+    
+    {\displaystyle \beta _{p}^{i,j}:=\operatorname {rank} H_{p}^{i,j}}
+  
+.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Persistent_homology_group-0.md b/data/en.wikipedia.org/wiki/Persistent_homology_group-0.md
new file mode 100644
index 000000000..90c2da5da
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Persistent_homology_group-0.md
@@ -0,0 +1,1060 @@
+---
+title: "Persistent homology group"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Persistent_homology_group"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:54:58.899489+00:00"
+instance: "kb-cron"
+---
+
+In persistent homology, a persistent homology group is a multiscale analog of a homology group that captures information about the evolution of topological features across a filtration of spaces. While the ordinary homology group represents nontrivial homology classes of an individual topological space, the persistent homology group tracks only those classes that remain nontrivial across multiple parameters in the underlying filtration. Analogous to the ordinary Betti number, the ranks of the persistent homology groups are known as the persistent Betti numbers. Persistent homology groups were first introduced by Herbert Edelsbrunner, David Letscher, and Afra Zomorodian in a 2002 paper Topological Persistence and Simplification, one of the foundational papers in the fields of persistent homology and topological data analysis,  based largely on the persistence barcodes and the persistence algorithm, that were first described by Serguei Barannikov in the 1994 paper. Since then, the study of persistent homology groups has led to applications in data science, machine learning, materials science, biology, and economics.
+
+
+== Definition ==
+Let 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+ be a simplicial complex, and let 
+  
+    
+      
+        f
+        :
+        K
+        →
+        
+          R
+        
+      
+    
+    {\displaystyle f:K\to \mathbb {R} }
+  
+ be a real-valued monotonic function. Then for some values 
+  
+    
+      
+        
+          a
+          
+            0
+          
+        
+        <
+        
+          a
+          
+            1
+          
+        
+        <
+        ⋯
+        <
+        
+          a
+          
+            n
+          
+        
+        ∈
+        
+          R
+        
+      
+    
+    {\displaystyle a_{0}<a_{1}<\cdots <a_{n}\in \mathbb {R} }
+  
+ the sublevel-sets 
+  
+    
+      
+        K
+        (
+        a
+        )
+        :=
+        
+          f
+          
+            −
+            1
+          
+        
+        (
+        −
+        ∞
+        ,
+        a
+        ]
+      
+    
+    {\displaystyle K(a):=f^{-1}(-\infty ,a]}
+  
+ yield a sequence of nested subcomplexes 
+  
+    
+      
+        ∅
+        =
+        
+          K
+          
+            0
+          
+        
+        ⊆
+        
+          K
+          
+            1
+          
+        
+        ⊆
+        ⋯
+        ⊆
+        
+          K
+          
+            n
+          
+        
+        =
+        K
+      
+    
+    {\displaystyle \emptyset =K_{0}\subseteq K_{1}\subseteq \cdots \subseteq K_{n}=K}
+  
+ known as a filtration of 
+  
+    
+      
+        K
+      
+    
+    {\displaystyle K}
+  
+.
+Applying 
+  
+    
+      
+        
+          p
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle p^{th}}
+  
+ homology to each complex yields a sequence of homology groups 
+  
+    
+      
+        0
+        =
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            0
+          
+        
+        )
+        →
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            1
+          
+        
+        )
+        →
+        ⋯
+        →
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            n
+          
+        
+        )
+        =
+        
+          H
+          
+            p
+          
+        
+        (
+        K
+        )
+      
+    
+    {\displaystyle 0=H_{p}(K_{0})\to H_{p}(K_{1})\to \cdots \to H_{p}(K_{n})=H_{p}(K)}
+  
+ connected by homomorphisms induced by the inclusion maps of the underlying filtration. When homology is taken over a field, we get a sequence of vector spaces and linear maps known as a persistence module.
+Let 
+  
+    
+      
+        
+          f
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+        :
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            i
+          
+        
+        )
+        →
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            j
+          
+        
+        )
+      
+    
+    {\displaystyle f_{p}^{i,j}:H_{p}(K_{i})\to H_{p}(K_{j})}
+  
+ be the homomorphism induced by the inclusion 
+  
+    
+      
+        
+          K
+          
+            i
+          
+        
+        ↪
+        
+          K
+          
+            j
+          
+        
+      
+    
+    {\displaystyle K_{i}\hookrightarrow K_{j}}
+  
+. Then the 
+  
+    
+      
+        
+          p
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle p^{th}}
+  
+ persistent homology groups are defined as the images 
+  
+    
+      
+        
+          H
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+        :=
+        im
+        ⁡
+        
+          f
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+      
+    
+    {\displaystyle H_{p}^{i,j}:=\operatorname {im} f_{p}^{i,j}}
+  
+ for all 
+  
+    
+      
+        1
+        ≤
+        i
+        ≤
+        j
+        ≤
+        n
+      
+    
+    {\displaystyle 1\leq i\leq j\leq n}
+  
+. In particular, the persistent homology group 
+  
+    
+      
+        
+          H
+          
+            p
+          
+          
+            i
+            ,
+            i
+          
+        
+        =
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            i
+          
+        
+        )
+      
+    
+    {\displaystyle H_{p}^{i,i}=H_{p}(K_{i})}
+  
+.
+More precisely, the 
+  
+    
+      
+        
+          p
+          
+            t
+            h
+          
+        
+      
+    
+    {\displaystyle p^{th}}
+  
+ persistent homology group can be defined as 
+  
+    
+      
+        
+          H
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+        =
+        
+          Z
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            i
+          
+        
+        )
+        
+          /
+        
+        
+          (
+          
+            
+              B
+              
+                p
+              
+            
+            (
+            
+              K
+              
+                j
+              
+            
+            )
+            ∩
+            
+              Z
+              
+                p
+              
+            
+            (
+            
+              K
+              
+                i
+              
+            
+            )
+          
+          )
+        
+      
+    
+    {\displaystyle H_{p}^{i,j}=Z_{p}(K_{i})/\left(B_{p}(K_{j})\cap Z_{p}(K_{i})\right)}
+  
+, where 
+  
+    
+      
+        
+          Z
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            ∙
+          
+        
+        )
+      
+    
+    {\displaystyle Z_{p}(K_{\bullet })}
+  
+ and 
+  
+    
+      
+        
+          B
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            ∙
+          
+        
+        )
+      
+    
+    {\displaystyle B_{p}(K_{\bullet })}
+  
+ are the standard p-cycle and p-boundary groups, respectively.
+
+
+== Birth and death of homology classes ==
+Sometimes the elements of 
+  
+    
+      
+        
+          H
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+      
+    
+    {\displaystyle H_{p}^{i,j}}
+  
+ are described as the homology classes that are "born" at or before 
+  
+    
+      
+        
+          K
+          
+            i
+          
+        
+      
+    
+    {\displaystyle K_{i}}
+  
+ and that have not yet "died" entering 
+  
+    
+      
+        
+          K
+          
+            j
+          
+        
+      
+    
+    {\displaystyle K_{j}}
+  
+. These notions can be made precise as follows. A homology class 
+  
+    
+      
+        γ
+        ∈
+        
+          H
+          
+            p
+          
+        
+        (
+        
+          K
+          
+            i
+          
+        
+        )
+      
+    
+    {\displaystyle \gamma \in H_{p}(K_{i})}
+  
+ is said to be born at 
+  
+    
+      
+        
+          K
+          
+            i
+          
+        
+      
+    
+    {\displaystyle K_{i}}
+  
+ if it is not contained in the image of the previous persistent homology group, i.e., 
+  
+    
+      
+        γ
+        ∉
+        
+          H
+          
+            p
+          
+          
+            i
+            −
+            1
+            ,
+            i
+          
+        
+      
+    
+    {\displaystyle \gamma \notin H_{p}^{i-1,i}}
+  
+. Conversely, 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+ is said to die entering 
+  
+    
+      
+        
+          K
+          
+            j
+          
+        
+      
+    
+    {\displaystyle K_{j}}
+  
+ if 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+ is subsumed (i.e., merges with) another older class as the sequence proceeds from 
+  
+    
+      
+        
+          K
+          
+            j
+            −
+            1
+          
+        
+        →
+        
+          K
+          
+            j
+          
+        
+      
+    
+    {\displaystyle K_{j-1}\to K_{j}}
+  
+. That is to say, 
+  
+    
+      
+        
+          f
+          
+            p
+          
+          
+            i
+            ,
+            j
+            −
+            1
+          
+        
+        (
+        γ
+        )
+        ∉
+        
+          H
+          
+            p
+          
+          
+            i
+            −
+            1
+            ,
+            j
+            −
+            1
+          
+        
+      
+    
+    {\displaystyle f_{p}^{i,j-1}(\gamma )\notin H_{p}^{i-1,j-1}}
+  
+ but 
+  
+    
+      
+        
+          f
+          
+            p
+          
+          
+            i
+            ,
+            j
+          
+        
+        (
+        γ
+        )
+        ∈
+        
+          H
+          
+            p
+          
+          
+            i
+            −
+            1
+            ,
+            j
+          
+        
+      
+    
+    {\displaystyle f_{p}^{i,j}(\gamma )\in H_{p}^{i-1,j}}
+  
+. The determination that an older class persists if it merges with a younger class, instead of the other way around, is sometimes known as the Elder Rule.
+The indices 
+  
+    
+      
+        i
+        ,
+        j
+      
+    
+    {\displaystyle i,j}
+  
+ at which a homology class 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+ is born and dies entering are known as the birth and death indices of 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+. The difference 
+  
+    
+      
+        j
+        −
+        i
+      
+    
+    {\displaystyle j-i}
+  
+ is known as the index persistence of 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+, while the corresponding difference 
+  
+    
+      
+        
+          a
+          
+            j
+          
+        
+        −
+        
+          a
+          
+            i
+          
+        
+      
+    
+    {\displaystyle a_{j}-a_{i}}
+  
+ in function values corresponding to those indices is known as the persistence of 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+ . If there exists no index at which 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+ dies, it is assigned an infinite death index. Thus, the persistence of each class can be represented as an interval in the extended real line 
+  
+    
+      
+        
+          R
+        
+        ∪
+        {
+        ±
+        ∞
+        }
+      
+    
+    {\displaystyle \mathbb {R} \cup \{\pm \infty \}}
+  
+ of either the form 
+  
+    
+      
+        [
+        
+          a
+          
+            i
+          
+        
+        ,
+        
+          a
+          
+            j
+          
+        
+        )
+      
+    
+    {\displaystyle [a_{i},a_{j})}
+  
+ or 
+  
+    
+      
+        [
+        
+          a
+          
+            i
+          
+          ′
+        
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle [a_{i}',\infty )}
+  
+. Since, in the case of an infinite field, the infinite number of classes always have the same persistence,  the collection over all classes of such intervals does not give meaningful multiplicities  for a multiset of intervals. Instead, such multiplicities and a multiset of intervals in the extended real line are given by the structure theorem of persistence homology. This multiset is known as the persistence barcode.
+
+
+== Canonical form ==
+Concretely, the structure theorem states that for any filtered complex over a field 
+  
+    
+      
+        F
+      
+    
+    {\displaystyle F}
+  
+, there exists a linear transformation that preserves the filtration and converts the filtered complex into so called canonical form, a canonically defined direct sum of filtered complexes of two types: two-dimensional complexes with trivial homology 
+  
+    
+      
+        d
+        (
+        
+          e
+          
+            
+              a
+              
+                j
+              
+            
+          
+        
+        )
+        =
+        
+          e
+          
+            
+              a
+              
+                i
+              
+            
+          
+        
+      
+    
+    {\displaystyle d(e_{a_{j}})=e_{a_{i}}}
+  
+ and one-dimensional complexes with trivial differential 
+  
+    
+      
+        d
+        (
+        
+          e
+          
+            
+              a
+              
+                i
+              
+              ′
+            
+          
+        
+        )
+        =
+        0
+      
+    
+    {\displaystyle d(e_{a'_{i}})=0}
+  
+.
+
+
+== Persistence diagram ==
+
+Geometrically, a barcode can be plotted as a multiset of points (with possibly infinite coordinates) 
+  
+    
+      
+        (
+        
+          a
+          
+            i
+          
+        
+        ,
+        
+          a
+          
+            j
+          
+        
+        )
+      
+    
+    {\displaystyle (a_{i},a_{j})}
+  
+ in the extended plane 
+  
+    
+      
+        
+          
+            (
+            
+              
+                R
+              
+              ∪
+              {
+              ±
+              ∞
+              }
+            
+            )
+          
+          
+            2
+          
+        
+      
+    
+    {\displaystyle \left(\mathbb {R} \cup \{\pm \infty \}\right)^{2}}
+  
+. By the above definitions, each point will lie above the diagonal, and the distance to the diagonal is exactly equal to the persistence of the corresponding class times 
+  
+    
+      
+        
+          
+            1
+            
+              2
+            
+          
+        
+      
+    
+    {\displaystyle {\frac {1}{\sqrt {2}}}}
+  
+.  This construction is known as the persistence diagram, and it provides a way of visualizing the structure of the persistence of homology classes in the sequence of persistent homology groups.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Photoanalysis-0.md b/data/en.wikipedia.org/wiki/Photoanalysis-0.md
new file mode 100644
index 000000000..c118062d0
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Photoanalysis-0.md
@@ -0,0 +1,71 @@
+---
+title: "Photoanalysis"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Photoanalysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:00.074890+00:00"
+instance: "kb-cron"
+---
+
+Photoanalysis (or photo analysis) refers to the study of pictures to compile various types of data, for example, to measure the size distribution of virtually anything that can be captured by photo. Photoanalysis technology has changed the way mines and mills quantify fragmented material.
+Images are an effective way to document conditions before, after, and even during blasting activities. The technology is advancing at a high rate, and lenses, storage media memory, light sensitivity and resolution have been improving steadily. Today's digital cameras and camcorders include high-resolution optics, compact size, automatic time and date stamps, good battery life, shutters to freeze motion, and computers to autofocus and eliminate jitter using image stabilization.
+
+
+== Mining ==
+Photoanalysis in mining operations can provide an automated system that forewarns a company of potential problems with materials, leading to economies and reduced damage caused from over-sized materials. It can also help determine the effectiveness of blasts.
+A company can use this technology to monitor materials moving on a conveyor belt in an underground environment, to measure piles left over from a blast, and even measure the amount of material being carried by dump trucks or vessels to a destination.
+Photoanalysis is being used on SAG mills worldwide to control the size of rock being crushed. Companies are using this technology to determine the size of particles being processed in the SAG Mill.[1] Archived 2009-05-23 at the Wayback Machine  Having oversize material entering the SAG mill makes an operation less efficient, costing companies money in electrical and maintenance costs.  Photoanalysis technology can eliminate unwanted material before it enters the mill, keeping rock crushing costs low. 
+
+
+== Forestry ==
+
+Wood chip size can affect the overall quality of a product. With automated photoanalysis systems, companies can remove any unwanted wrong-size particles without stopping their mill process.
+Photoanalysis can affect how efficiently forestry companies operate. In mills worldwide, photoanalysis technology is improving the use of lumber products, cutting back on the amount of trees being used to operate, and saving companies money through quality control optimization.[2]
+With the current downturn in the North American forestry industry, operators are looking at making their mills more efficient and effective when processing materials. Photoanalysis technology helps identify any weaknesses in the process by continuously monitoring different sections of an operation.
+
+
+== Agriculture ==
+
+Agricultural companies can, using photoanalysis, monitor conveyor belts of food without contaminating the product by touching it. Other benefits of photoanalysis systems include:
+
+Automated removal of any unwanted material on food conveyor
+Improved quality control for the most important parts of the agricultural process
+Pinpoint accuracy that helps the efficiency and effectiveness of product handling techniques
+The importance of photoanalysis technology is being noticed by the agricultural industry as it identifies any unwanted materials going through the process. In an example, if a mouse is on a conveyor of corn, photoanalysis technology would be able to identify the unwanted object and remove it before it contaminates the whole process.
+
+
+== Origins of photoanalysis technology ==
+Photoanalysis technology was created by using the Waterloo Image Enhancement Process in the 1980s.  After further development of the imaging process with explosives producer DuPont, engineers Tom Palangio and Takis Katsabanis began selling photoanalysis software commercially. They later renamed the process WipFrag, standing for Waterloo Image Process Fragmentation
+Today, photoanalysis technology has evolved into stabilized and portable systems that can automatically capture and analyze results instantly.  Thousands of these products are currently being used around the world to measure fragmented material.
+
+
+== Photoanalysis equipment photos ==
+
+
+== Fragmentation analysis ==
+Fragmentation analysis is becoming a popular term in mining, agricultural and forestry industries. With the majority of money in these industries directed towards the proper sizing of materials, companies are using fragmentation analysis to determine various factors within an operation.[3]
+The two main ways a company keeps track of fragmented material are through manual and automated sieving procedures. Manual sieving involves extracting a sample of material to analyze the size distribution. The results can be tabulated within two days.  Automated sieving is an advanced way of sieving materials running through a process. Without having to extract the material, photoanalysis can take place, allowing for immediate results with pinpoint accuracy.
+
+
+== Blast Fragmentation Software ==
+Operators are using fragmentation analysis to determine the effectiveness of various blasts. With automated sieving technology, workers can track the success of these blasts and receive instant results. Companies are using these results to determine what blasting method yielded the best results for their specific operation. The common variables associated with blast optimization are the provided Particle Size Distribution (PSD) from a shovel fragmentation system, geology including rock type and fracturing, and energy factor.
+By using photoanalysis the fragmented materials can be monitored, offering pinpoint accuracy and allowing mine operators to make adjustments to future blasting procedures. See Optical Granulometry to view the automated sieving process.
+
+
+== Pre-crushing analysis ==
+
+Maintenance costs can be significantly reduced if an operation focuses on the fragmentation of the particles passing through their process. Automated sieving systems can detect and help remove any oversize material before it enters the crusher and causes maintenance problems. It also helps determine the effectiveness of the mining process prior to crushing; the sizing of material is always a critical part of operations in the mining, forestry and agricultural industries.
+Having an analysis taking place at every major point in an operation allows for the proper tracking of material being processed. Engineers can then determine what part of the process needs improving based solely on the size of material. 
+
+
+== Post-crushing analysis ==
+Measuring how effective industrial crushers are, can help save a company millions of dollars in energy costs on an annual basis. There are two components that affect a typical crusher: the size of the material inputted, and the speed at which the crusher is moving. If the user can find a perfect balance between these two components, the materials will be crushed to the right size in the shortest time possible.
+Meeting the material standards set by governments and large companies can be hard. Having a post-crushing analysis taking place ensures that no oversize material gets shipped; eliminating the chance of getting fined for not meeting industry specifications.
+
+
+== See also ==
+Optical granulometry for more information on the photoanalysis process
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Pivot_table-0.md b/data/en.wikipedia.org/wiki/Pivot_table-0.md
new file mode 100644
index 000000000..1a2dac356
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Pivot_table-0.md
@@ -0,0 +1,45 @@
+---
+title: "Pivot table"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Pivot_table"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:01.231074+00:00"
+instance: "kb-cron"
+---
+
+A pivot table is a table of values which are aggregations of groups of individual values from a more extensive table (such as from a database, spreadsheet, or business intelligence program) within one or more discrete categories. The aggregations or summaries of the groups of the individual terms might include sums, averages, counts, or other statistics. A pivot table is the outcome of the statistical processing of tabularized raw data and can be used for decision-making.
+Although pivot table is a generic term, Microsoft held a trademark on the term in the United States from 1994 to 2020.
+
+== History ==
+In their book Pivot Table Data Crunching, Bill Jelen and Mike Alexander refer to Pito Salas as the "father of pivot tables". While working on a concept for a new program that would eventually become Lotus Improv, Salas noted that spreadsheets have patterns of data. A tool that could help the user recognize these patterns would help to build advanced data models quickly. With Improv, users could define and store sets of categories, then change views by dragging category names with the mouse. This core functionality would provide the model for pivot tables.
+Lotus Development released Improv in 1991 on the NeXT platform. A few months after the release of Improv, Brio Technology published a standalone Macintosh implementation, called DataPivot (with technology eventually patented in 1999). Borland purchased the DataPivot technology in 1992 and implemented it in their own spreadsheet application, Quattro Pro.
+In 1993 the Microsoft Windows version of Improv appeared. Early in 1994 Microsoft Excel 5 brought a new functionality called a "PivotTable" to market. Microsoft further improved this feature in later versions of Excel:
+
+Excel 97 included a new and improved PivotTable Wizard, the ability to create calculated fields, and new pivot cache objects that allow developers to write Visual Basic for Applications macros to create and modify pivot tables
+Excel 2000 introduced "Pivot Charts" to represent pivot-table data graphically
+Office 365 added the PIVOTBY function to Excel allowing users to create summary of data via a function versus building a Pivot Table.
+In 2007 Oracle Corporation made PIVOT and UNPIVOT operators available in Oracle Database 11g.
+
+== Mechanics ==
+For typical data entry and storage, data usually appear in flat tables, meaning that they consist of only columns and rows, as in the following portion of a sample spreadsheet showing data on shirt types:
+
+While tables such as these can contain many data items, it can be difficult to get summarized information from them. A pivot table can help quickly summarize the data and highlight the desired information. The usage of a pivot table is extremely broad and depends on the situation. The first question to ask is, "What am I seeking?" In the example here, let us ask, "How many Units did we sell in each Region for every Ship Date?":
+
+A pivot table usually consists of row, column and data (or fact) fields. In this case, the column is ship date, the row is region and the data we would like to see is (sum of) units. These fields allow several kinds of aggregations, including: sum, average, standard deviation, count, etc. In this case, the total number of units shipped is displayed here using a sum aggregation.
+
+== Implementation ==
+Using the example above, the software will find all distinct values for Region. In this case, they are: North, South, East, West. Furthermore, it will find all distinct values for Ship date. Based on the aggregation type, sum, it will summarize the fact, the quantities of Unit, and display them in a multidimensional chart. In the example above, the first datum is 66. This number was obtained by finding all records where both Region was East and Ship Date was 2005-01-31, and adding the Units of that collection of records (i.e., cells E2 to E7) together to get a final result.
+Pivot tables are not created automatically. For example, in Microsoft Excel one must first select all of the data in the original table and then go to the Insert tab and select "Pivot Table" (or "Pivot Chart"). The user then has the option of either inserting the pivot table into an existing sheet or creating a new sheet to house the pivot table. A pivot table field list is provided to the user which lists all the column headers present in the data. For instance, if a table represents sales data of a company, it might include Date of sale, Sales person, Item sold, Color of item, Units sold, Per unit price, and Total price. This makes the data more readily accessible.
+
+The fields that would be created will be visible on the right hand side of the worksheet. By default, the pivot table layout design will appear below this list.
+Pivot Table fields are the building blocks of pivot tables. Each of the fields from the list can be dragged on to this layout, which has four options:
+
+Filters
+Columns
+Rows
+Values
+Some uses of pivot tables are related to the analysis of questionnaires with optional responses but some implementations of pivot tables do not allow these use cases. For example the implementation in LibreOffice Calc since 2012 is not able to process empty cells.
+
+=== Filters ===
+Report filter is used to apply a filter to an entire table. For example, if the "Color of Item" field is dragged to this area, then the table constructed will have a report filter inserted above the table. This report filter will have drop-down options (Black, Red, and White in the example above). When an option is chosen from this drop-down list ("Black" in this example), then the table that would be visible will contain only the data from those rows that have the "Color of Item= Black".
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Pivot_table-1.md b/data/en.wikipedia.org/wiki/Pivot_table-1.md
new file mode 100644
index 000000000..d4bd3e894
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Pivot_table-1.md
@@ -0,0 +1,64 @@
+---
+title: "Pivot table"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Pivot_table"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:01.231074+00:00"
+instance: "kb-cron"
+---
+
+=== Columns ===
+Column labels are used to apply a filter to one or more columns that have to be shown in the pivot table. For instance if the "Salesperson" field is dragged to this area, then the table constructed will have values from the column "Sales Person", i.e., one will have a number of columns equal to the number of "Salesperson". There will also be one added column of Total. In the example above, this instruction will create five columns in the table — one for each salesperson, and Grand Total. There will be a filter above the data — column labels — from which one can select or deselect a particular salesperson for the pivot table.
+This table will not have any numerical values as no numerical field is selected but when it is selected, the values will automatically get updated in the column of "Grand total".
+
+=== Rows ===
+Row labels are used to apply a filter to one or more rows that have to be shown in the pivot table. For instance, if the "Salesperson" field is dragged on this area then the other output table constructed will have values from the column "Salesperson", i.e., one will have a number of rows equal to the number of "Sales Person". There will also be one added row of "Grand Total". In the example above, this instruction will create five rows in the table — one for each salesperson, and Grand Total. There will be a filter above the data — row labels — from which one can select or deselect a particular salesperson for the Pivot table.
+This table will not have any numerical values, as no numerical field is selected, but when it is selected, the values will automatically get updated in the Row of "Grand Total".
+
+=== Values ===
+This usually takes a field that has numerical values that can be used for different types of calculations. However, using text values would also not be wrong; instead of Sum, it will give a count. So, in the example above, if the "Units sold" field is dragged to this area along with the row label of "Salesperson", then the instruction will add a new column, "Sum of units sold", which will have values against each salesperson.
+
+== Application support ==
+Pivot tables or pivot functionality are an integral part of many spreadsheet applications and some database software, as well as being found in other data visualization tools and business intelligence packages.
+
+=== Spreadsheets ===
+Microsoft Excel supports PivotTables, which can be visualized through PivotCharts.
+Apache POI
+LibreOffice Calc and Openoffice Calc support pivot tables. Prior to version 3.4, this feature was named "DataPilot".
+Calligra Sheets supports pivot tables.
+Google Sheets natively supports pivot tables.
+Numbers, from Apple Inc., gained pivot table support in version 11.2.
+
+=== Database support ===
+PostgreSQL, an object–relational database management system, allows the creation of pivot tables using the tablefunc module.
+MariaDB, a MySQL fork, allows pivot tables using the CONNECT storage engine.
+Microsoft Access supports pivot queries under the name "crosstab" query. 
+Microsoft SQL Server supports pivot as of SQL Server 2016 with the FROM...PIVOT keywords
+Oracle Database supports the PIVOT operation.
+Some popular databases that do not directly support pivot functionality, such as SQLite, can usually simulate pivot functionality using embedded functions, dynamic SQL or subqueries. The issue with pivoting in such cases is usually that the number of output columns must be known at the time the query starts to execute; for pivoting this is not possible as the number of columns is based on the data itself. Therefore, the names must be hard coded or the query to be executed must itself be created dynamically (meaning, prior to each use) based upon the data.
+
+=== Web applications ===
+Several JavaScript UI frameworks and web application libraries provide components for embedding pivot tables in web applications.
+
+ZK, an Ajax framework, allows embedding pivot tables in web applications.[citation needed]
+Webix, a JavaScript UI library, includes a Pivot component for embedding interactive pivot table functionality in web applications.
+
+=== Programming languages and libraries ===
+Programming languages and libraries suited to work with tabular data contain functions that allow the creation and manipulation of pivot tables.
+
+Python data analysis toolkit pandas has the function pivot_table and the xs method useful to obtain sections of pivot tables.
+R has the Tidyverse metapackage, which contains a collection of tools providing pivot table functionality, as well as the pivottabler package. The example specific to this article can be implemented in tidyr (in the Tidyverse metapackage) directly via the pivot_wider function.
+
+== Online analytical processing ==
+Excel pivot tables include the feature to directly query an online analytical processing (OLAP) server for retrieving data instead of getting the data from an Excel spreadsheet. On this configuration, a pivot table is a simple client of an OLAP server. Excel's PivotTable not only allows for connecting to Microsoft's Analysis Service, but to any XML for Analysis (XMLA) OLAP standard-compliant server.
+
+== See also ==
+
+== References ==
+
+== Further reading ==
+A Complete Guide to PivotTables: A Visual Approach (ISBN 1-59059-432-0) (in-depth review at slashdot.org)
+Excel 2007 PivotTables and PivotCharts: Visual blueprint (ISBN 978-0-470-13231-9)
+Pivot Table Data Crunching (Business Solutions) (ISBN 0-7897-3435-4)
+Beginning Pivot Tables in Excel 2007 (ISBN 1-59059-890-3)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Post_hoc_analysis-0.md b/data/en.wikipedia.org/wiki/Post_hoc_analysis-0.md
new file mode 100644
index 000000000..f1f747ceb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Post_hoc_analysis-0.md
@@ -0,0 +1,51 @@
+---
+title: "Post hoc analysis"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Post_hoc_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:02.388216+00:00"
+instance: "kb-cron"
+---
+
+In a scientific study, post hoc analysis (from Latin post hoc, "after this") consists of statistical analyses that were specified after the data were seen. A post hoc analysis is usually used to explore specific, statistically significant differences between the means of three or more independent groups-- differences detected with an analysis of variance (ANOVA). An ANOVA does not identify the group(s); for that, a post hoc analysis is required. 
+Because each post hoc analysis is effectively a statistical test, conducting multiple post hoc comparisons introduces a family-wise error rate problem, which is a type of multiple testing problem. This increases the likelihood of false positives unless corrected.
+Post hoc tests are follow-up tests performed after a significant ANOVA result to identify where the differences lie (which specific groups differ). To compensate, multiple post hoc testing procedures are sometimes used, but that is often difficult or impossible to do precisely. Post hoc analysis that is conducted and interpreted without adequate consideration of this problem is sometimes called data dredging (p-hacking) by critics because the statistical associations that it finds are often spurious. In other words, findings from data dredging are invalid or not trustworthy. 
+Post hoc analyses are acceptable when transparently reported as exploratory. In other words, post hoc analyses are not inherently unethical. The main requirement for their ethical use is simply that their results not be mispresented as the original hypothesis. Modern editions of scientific manuals have clarified this point; for example, APA style now specifies that "hypotheses should now be stated in three groupings: preplanned–primary, preplanned–secondary, and exploratory (post hoc). Exploratory hypotheses are allowable, and there should be no pressure to disguise them as if they were preplanned."
+
+
+== Types of post hoc analysis ==
+Types or categories of post hoc analyses include: 
+
+Pairwise comparisons: Tests all possible pairs
+Trend analysis: Tests for linear or quadratic trends across ordered groups
+Simple effects analysis: Examines effects within factorial ANOVA
+Interaction probing: Analyzes interaction constraints within factorial ANOVA
+Restricted Sets of Contrasts: Testing smaller families of comparisons
+In addition, a subgroup analysis examines whether findings differ between discrete categories of subjects in the sample. This approach is common in clinical and observational studies.
+
+
+== Common post hoc tests ==
+Common post hoc tests include:
+
+Fisher's least significant difference
+Holm-Bonferroni Procedure
+Newman-Keuls
+Rodger's Method
+Scheffé's Method
+Tukey's Test and Honestly Significance Difference (HSD) (see also: Studentized Range Distribution)
+However, with the exception of Scheffès Method, these tests should be specified "a priori"  despite being called  "post-hoc" in conventional usage. For example, a difference between means could be significant with the Holm-Bonferroni method but not with the Turkey Test and vice versa. It would be poor practice for a data analyst to choose which of these tests to report based on which gave the desired result.
+
+
+== Causes ==
+Sometimes the temptation to engage in post hoc analysis is motivated by a desire to produce positive results or see a project as successful. In the case of pharmaceutical research, there may be significant financial consequences to a failed trial.
+
+
+== See also ==
+HARKing
+Testing hypotheses suggested by the data
+Nemenyi test
+Outcome switching
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Principle_of_faunal_succession-0.md b/data/en.wikipedia.org/wiki/Principle_of_faunal_succession-0.md
new file mode 100644
index 000000000..dce4dc31f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Principle_of_faunal_succession-0.md
@@ -0,0 +1,28 @@
+---
+title: "Principle of faunal succession"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Principle_of_faunal_succession"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:40.963821+00:00"
+instance: "kb-cron"
+---
+
+The principle of faunal succession, also known as the law of faunal succession, is based on the observation that sedimentary rock strata contain fossilized flora and fauna, and that these fossils succeed each other vertically in a specific, reliable order that can be identified over wide horizontal distances.  A fossilized Neanderthal bone (less than 500,000 years old) will never be found in the same stratum as a fossilized Megalosaurus (about 160 million years old), for example, because neanderthals and megalosaurs lived during different geological periods, separated by millions of years. This allows for strata to be identified and dated by the fossils found within.
+This principle, which received its name from the English geologist William Smith, is of great importance in determining the relative age of rocks and strata.  The fossil content of rocks together with the law of superposition helps to determine the time sequence in which sedimentary rocks were laid down.
+Evolution explains the observed faunal and floral succession preserved in rocks. Faunal succession was documented by Smith in England during the first decade of the 19th century, and concurrently in France by Cuvier (with the assistance of the mineralogist Alexandre Brongniart). Archaic biological features and organisms are succeeded in the fossil record by more modern versions.  For instance, paleontologists investigating the evolution of birds predicted that feathers would first be seen in primitive forms on flightless predecessor organisms such as feathered dinosaurs. This is precisely what has been discovered in the fossil record: simple feathers, incapable of supporting flight, are succeeded by increasingly large and complex feathers.
+In practice, the most useful diagnostic species are those with the fastest rate of species turnover and the widest distribution; their study is termed biostratigraphy, the science of dating rocks by using the fossils contained within them. In Cenozoic strata, fossilized tests of foraminifera are often used to determine faunal succession on a refined scale, each biostratigraphic unit (biozone) being a geological stratum that is defined on the basis of its characteristic fossil taxa. An outline microfaunal zonal scheme based on both foraminifera and ostracoda was compiled by M. B. Hart (1972).
+Earlier fossil life forms are simpler than more recent forms, and more recent fossil forms are more similar to living forms (principle of faunal succession).
+
+
+== See also ==
+
+Index fossil
+Law of superposition
+Principle of cross-cutting relationships
+Principle of lateral continuity
+Principle of original horizontality
+History of paleontology
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Problem_of_the_speckled_hen-0.md b/data/en.wikipedia.org/wiki/Problem_of_the_speckled_hen-0.md
new file mode 100644
index 000000000..a39b82489
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Problem_of_the_speckled_hen-0.md
@@ -0,0 +1,18 @@
+---
+title: "Problem of the speckled hen"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Problem_of_the_speckled_hen"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:13.885766+00:00"
+instance: "kb-cron"
+---
+
+In the theory of empirical knowledge, the problem of the speckled hen is whether a single immediate observation of a speckled hen provides a certain knowledge of the number of speckles observed. Clearly, this is not an isolated example, and therefore it is of fundamental nature. Philosophically, this problem probes the limits of knowledge by acquaintance: one is unable to know with certainty the existence of determinate things in one's experience merely by the virtue of the experience.
+Roderick Chisholm attributes it to Gilbert Ryle suggesting to A. J. Ayer. It is viewed as a criticism of the view expressed by C. I. Lewis that there can never be "positive bafflement in the presence of the immediate, because there is here no question which fails to find an answer."
+Joseph Heath remarks that this problem is one of the "descendants of Descartes's 'chiliagon' argument in the sixth of his Meditations".
+A. J. Ayer suggested that if we are unable to enumerate speckles accurately, then it is incorrect to suggest that the "sense-data" provides a definite number of speckles despite the fact that the hen does have a definite number of them, clearly outlined. In Ayers' words, speckles are enumerable only if in fact they have been enumerated.
+A number of philosophers analyzed the merits of this proposition. Chisholm concludes that the problem of the speckled hen emphasizes the fact that there are basic propositions (synthetic propositions which do not refer beyond the content of the immediate experience) that are necessarily imprecise.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Proof_(truth)-0.md b/data/en.wikipedia.org/wiki/Proof_(truth)-0.md
new file mode 100644
index 000000000..010a3d260
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Proof_(truth)-0.md
@@ -0,0 +1,41 @@
+---
+title: "Proof (truth)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Proof_(truth)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:15.017890+00:00"
+instance: "kb-cron"
+---
+
+A proof is  sufficient evidence or a sufficient argument for the truth of a proposition.
+The concept applies in a variety of disciplines,
+with both the nature of the evidence or justification and the criteria for sufficiency being area-dependent. In the area of oral and written communication such as conversation, dialog, rhetoric, etc., a proof is a persuasive perlocutionary speech act, which demonstrates the truth of a proposition. In any area of mathematics defined by its assumptions or axioms, a proof is an argument establishing a theorem of that area via accepted rules of inference starting from those axioms and from other previously established theorems.  The subject of logic, in particular proof theory, formalizes and studies the notion of formal proof.  In some areas of epistemology and theology, the notion of justification plays approximately the role of proof, while in jurisprudence the corresponding term is evidence,
+with "burden of proof" as a concept common to both philosophy and law.
+In most disciplines, evidence is required to prove something. Evidence is drawn from the experience of the world around us, with science obtaining its evidence from nature, law obtaining its evidence from witnesses and forensic investigation, and so on.   A notable exception is mathematics, whose proofs are drawn from a mathematical world begun with axioms and further developed and enriched by theorems proved earlier.
+Exactly what evidence is sufficient to prove something is also strongly area-dependent, usually with no absolute threshold of sufficiency at which evidence becomes proof. In law, the same evidence that may convince one jury may not persuade another.  Formal proof provides the main exception, where the criteria for proofhood are ironclad and it is impermissible to defend any step in the reasoning as "obvious" (except for the necessary ability of the one proving and the one being proven to, to correctly identify any symbol used in the proof.); for a well-formed formula to qualify as part of a formal proof, it must be the result of applying a rule of the deductive apparatus of some formal system to the previous well-formed formulae in the proof sequence.
+Proofs have been presented since antiquity. Aristotle used the observation that patterns of nature never display the machine-like uniformity of determinism as proof that chance is an inherent part of nature. On the other hand, Thomas Aquinas used the observation of the existence of rich patterns in nature as proof that nature is not ruled by chance.
+Proofs need not be verbal.  Before Copernicus, people took the apparent motion of the Sun across the sky as proof that the Sun went round the Earth.  Suitably incriminating evidence left at the scene of a crime may serve as proof of the identity of the perpetrator.  Conversely, a verbal entity need not assert a proposition to constitute a proof of that proposition.  For example, a signature constitutes direct proof of authorship; less directly, handwriting analysis may be submitted as proof of authorship of a document.  Privileged information in a document can serve as proof that the document's author had access to that information; such access might in turn establish the location of the author at certain time, which might then provide the author with an alibi.
+
+
+== Proof vs evidence ==
+18th-century Scottish philosopher David Hume built on Aristotle's separation of belief from knowledge, recognizing that one can be said to "know" something only if one has firsthand experience with it, in a strict sense proof, while one can infer that something is true and therefore "believe" it without knowing, via evidence or supposition. This speaks to one way of separating proof from evidence: 
+If one cannot find their chocolate bar, and sees chocolate on their napping roommate's face, this evidence can cause one to believe their roommate ate the chocolate bar. But they do not know their roommate ate it. It may turn out that the roommate put the candy away when straightening up, but was thus inspired to go eat their own chocolate. Only if one directly experiences proof of the roommate eating it, perhaps by walking in on them doing so, would one have certain knowledge, in Hume's sense, that the roommate did it.
+In a more strict sense of sure knowledge, one may be unable to prove anything to a rational certainty beyond that of the existence of one's immediate sensory awareness. Descartes famously raised a similarly strict standard with his first principle Cogito, ergo sum (I think, therefore I am). While Descartes' larger project in Meditations on First Philosophy has knowledge of God and the external world—founded on the certainty of the cogito—as its aim, his legacy in doing so is to have shown that one cannot have such proof, because all perceptions could be false (such as under the evil demon or simulated reality hypotheses). One nevertheless can still have clear proof of the existence of one's thought, even if belief in the external world lacks the certainty of demonstration beyond that of one's own firsthand experience.
+
+
+== See also ==
+
+Mathematical proof
+Proof theory
+Proof of concept
+Provability logic
+Evidence, information which tends to determine or demonstrate the truth of a proposition
+Proof procedure
+Proof complexity
+Standard of proof
+Proving a negative
+Proof of impossibility – Category of mathematical proof
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Psychoanalytic_infant_observation-0.md b/data/en.wikipedia.org/wiki/Psychoanalytic_infant_observation-0.md
new file mode 100644
index 000000000..08fd7c2d7
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Psychoanalytic_infant_observation-0.md
@@ -0,0 +1,27 @@
+---
+title: "Psychoanalytic infant observation"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Psychoanalytic_infant_observation"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:00.301945+00:00"
+instance: "kb-cron"
+---
+
+Psychoanalytic infant observation is a distinct empirical case study method in psychoanalytic and psychotherapy training which was developed at the Tavistock Clinic in London by child psychoanalyst Esther Bick. In 1948 she collaborated with John Bowlby to develop the approach as part of psychotherapy training. It has since become an essential feature of pre-clinical training in child and adult psychotherapy, psychoanalysis and related fields throughout the Western world.
+Psychoanalytic infant observation usually involves observing an infant and mother weekly over a two-year period beginning soon after birth until the child's second birthday. This naturalistic form of experiential enquiry provides a unique opportunity to sharpen and extend the observational skills of future therapists. Trainees learn first-hand how a relationship develops between babies and their family members and enables them to think about how babies grow physically, mentally and emotionally. The experience of observing family life is invaluable for professionals who later work with complex and disturbing presentations.
+
+== Rationale for the method ==
+
+Infant Observation was the inspired initiative of Esther Bick. As a Child Psychoanalyst she pioneered this particular approach to studying babies in the midst of their family environment. In 1948, she began teaching at the Tavistock Clinic and in collaboration with John Bowlby she started the practice of observation as an integral part of psychotherapy training. This involved finding families about to welcome a newborn and gaining their consent to freely participate in the two-year project. It consisted in visiting a family to observe their infant from birth to two years. These weekly observations in the natural environment of the baby's home offered a vivid learning experience of child development. Observers came to appreciate the mutual influence of the developing relationship between mother and baby, father and siblings (if any). Importantly, the observer was also invited to consider the feelings aroused in themselves during the observation and how their presence in the home could be influencing events.
+Esther Bick's 1964 paper ‘Notes on infant observation in psycho-analytic training’ set out the model of infant observation and her view of how much can be learned from it — how to observe, the nature of early infantile anxiety, especially the baby's apparent fear of ‘falling to bits’, the impact of maternal anxiety and postnatal depression, and the significance of good observational capacities for future child analysts. She emphasized the gathering of data over time, the need to wait for meaning to emerge, and the observer's responsibility to respect their role as learner and to behave with tact and reliability.
+Bick's ideas took shape at the same time as Wilfred Bion’s work on ‘A theory of thinking’ and these two explorations of the emotional and cognitive dimensions of the early mother-child relationship are profoundly complementary. Both build on the work of Melanie Klein and her pioneering analysis of children.
+
+== Later Developments of the method ==
+Over the last fifty years courses for professionals working with children and families have made increasing use of infant and child observation as a central aspect of training. It has proved invaluable in increasing professional skills and in sensitising workers to the range of anxieties, difficulties and creative possibilities in each family.
+From 1960 to 1980 Martha Harris was head of the Child Psychotherapy service at the Tavistock Clinic. She was responsible for the expansion in the number of British and international trainees at the Tavistock and for developing the training into what became known as the "Tavistock Model". The model, in which infant observation continues to play a pre-eminent role, has been adopted with modifications across the UK and internationally: for example, GERPEN in France and at the Martha Harris Study Centres in Italy.
+Beginning in the 1980s, and initially supported by visiting staff from the Tavistock Clinic, courses in infant observation were developed to support the training of a wide range of professionals across the UK and across the West. Over time other components and seminars were added to develop a comprehensive programme leading to a post-graduate qualification. The post-graduate programme known as Psychoanalytic Observational Studies which is run under the auspices of the Tavistock Clinic is currently delivered in the UK in Belfast, Birmingham, Bristol, Devon, Oxford and Liverpool and in Italy, in Florence, Genoa and Milan. In the UK equivalent post-graduate programmes exist at the Anna Freud Centre and the British Psychotherapy Foundation in London, the Northern School of Child and Adolescent Psychotherapy with the University of Leeds, at the University of Northumbria in Newcastle and at Human Development, Scotland with the University of Strathclyde in Glasgow. In the US the programmes are run at the Washington School of Psychiatry, Washington, D.C. since 2004 and at Columbia University.
+
+== The Process of Observation ==
+
+Psychoanalytic infant observation generally involves a weekly observation over a two-year period of an infant soon after birth and until their second birthday. Trainees normally undertake the observation in the home setting for one hour per week at the same time in the week, to fit in with the family's schedule. Trainees are responsible for finding a baby to observe under the guidance of their tutor. New observers attend seminars to discuss the practicalities of setting up an observation and to learn about the process of finding a baby.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Psychoanalytic_infant_observation-1.md b/data/en.wikipedia.org/wiki/Psychoanalytic_infant_observation-1.md
new file mode 100644
index 000000000..50ea50f3a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Psychoanalytic_infant_observation-1.md
@@ -0,0 +1,54 @@
+---
+title: "Psychoanalytic infant observation"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Psychoanalytic_infant_observation"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:00.301945+00:00"
+instance: "kb-cron"
+---
+
+Every observation is written up in detail as soon after the observation as possible. This can often take about an hour to complete. Students discuss their observations in small group seminars which take place on a weekly, over two academic years. Each trainee has the opportunity to present their detailed observations to the group. The presentations are anonymised and no identifying features are used.
+The unique experience of psychoanalytic observation allows the trainee to observe a mother and baby, living through and resolving routine and difficult situations in their own ways, without any intervention from the observer. With the help of the seminar, the observer learns to process the inclination for judgmental and blaming thoughts which arise when anxiety is stirred. Along with developing sensitivity and precision in observation, the course teaches how to think freshly and inductively from observation, including trying to understand how the developing infant is making sense of his world.
+
+== The Young Child and Brief Observations ==
+
+Some courses and trainings, including those at the Tavistock Clinic, The Birmingham Trust for Psychoanalytic Psychotherapy and the Northern School of Child and Adolescent Psychotherapy also offer the chance to undertake an observation of a pre-school child (approximately two to four years old) in their family or in a nursery setting for an hour a week for one academic year. This gives an opportunity for an additional understanding of development through the experience of observation as the child starts to communicate verbally and non-verbally with other children and with adults outside the immediate family and takes a range of steps towards the world outside the family.
+Several courses provide the opportunity to undertake a brief infant or young child observation as a less intensive but still valuable training experience. (See for example Infant Mental Health and Early Intervention with Under Threes and their Parents.
+
+== International Journal of Infant Observation and Its Applications ==
+Infant Observation, the journal, is published by Taylor and Francis and the current Editor is Trudy Klauber. The international journal publishes the best of the varied and original writing emerging from this field. It comprises case studies on infant and young child observation, research papers, and articles focusing on wider applications of the psychoanalytic observational method, including its relevance to reflective professional practice in fields such as social work, education and nursing. Papers are peer-reviewed. There is a developing body of research knowledge that draws upon the infant observation approach
+
+== See also ==
+Psychoanalytic Study of the Child
+Selma Fraiberg
+Margaret Lowenfeld
+Joel Ryce-Menuhin
+James Robertson
+Sandplay Therapy
+
+== References ==
+
+== Bibliography ==
+Bick,  Esther. (1964) ‘Notes on infant observation in psycho-analytic training’. Reprinted in Collected Papers of Martha Harris and Esther Bick. Clunie Press, 1987.
+Harris, Martha. (1976). ‘The contribution of observation of mother-infant interaction and development to the equipment of a psychoanalyst’ Reprinted in M. H. Williams (ed.) (2011), pp. 117–132.
+Harris, Martha. (1977) ‘The Tavistock training and philosophy’. Reprinted in The Tavistock Model: Papers on Child Development and Psychoanalytic Training by Martha Harris and Esther Bick, ed. M. H. Williams (London: Harris Meltzer Trust/ Karnac, 2011), pp. 1–24.
+Pines, Malcolm (2009). "Mirroring and child development". Psychoanalytic Enquiry. 5 (2): 211–231. doi:10.1080/07351698509533585.
+Reid, Susan. (Ed.) (1997) Developments in Infant Observation: The Tavistock Model. Hove: Routledge
+Rustin, Margaret. (2009).  'Esther Bick's legacy of infant observation at the Tavistock – some reflections 60 years on',Infant Observation: International Journal of Infant Observation and Its Applications, 12(1), p. 32.
+Rustin, Michael. (2006). ‘Infant observation research: What have we learned so far?’ Infant Observation: International Journal of Infant Observation and Its Applications,  9 (1), pp. 35–52.
+Sternberg, Janine. (2005). Infant Observation at the Heart of Training. London: Karnac.
+Waddell,  Margot. (2013). ‘Infant observation in Britain: a Tavistock approach’. Infant Observation: International Journal of Infant Observation and Its Applications, 16(1), pp. 4–22.
+
+== External links ==
+Videos of NSCAP students talking about the experience of undertaking Psychoanalytic Infant Observation
+Tavistock Clinic
+International Journal of Infant Observation and Its Applications
+Esther Bick
+British Psychotherapy Foundation
+Birmingham Trust for Psychoanalytic Psychotherapy
+Northern School of Child and Adolescent Psychotherapy
+Human Development Scotland
+Anna Freud Centre
+Association of Child Psychotherapists
+Washington School of Psychiatry
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Qualitative_comparative_analysis-0.md b/data/en.wikipedia.org/wiki/Qualitative_comparative_analysis-0.md
new file mode 100644
index 000000000..7c20d76aa
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Qualitative_comparative_analysis-0.md
@@ -0,0 +1,23 @@
+---
+title: "Qualitative comparative analysis"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Qualitative_comparative_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:03.633787+00:00"
+instance: "kb-cron"
+---
+
+In statistics, qualitative comparative analysis (QCA) is a data analysis based on set theory to examine the relationship of conditions to outcome. QCA describes the relationship in terms of necessary conditions and sufficient conditions. The technique was originally developed by Charles Ragin in 1987 to study data sets that are too small for linear regression analysis but large enough for cross-case analysis.
+
+== Summary of technique ==
+In the case of categorical variables, QCA begins by listing and counting all types of cases which occur, where each type of case is defined by its unique combination of values of its independent and dependent variables. For instance, if there were four categorical variables of interest, {A,B,C,D}, and A and B were dichotomous (could take on two values), C could take on five values, and D could take on three, then there would be 60 possible types of observations determined by the possible combinations of variables, not all of which would necessarily occur in real life. By counting the number of observations that exist for each of the 60 unique combination of variables, QCA can determine which descriptive inferences or implications are empirically supported by a data set. Thus, the input to QCA is a data set of any size, from small-N to large-N, and the output of QCA is a set of descriptive inferences or implications the data supports.
+In QCA's next step, inferential logic or Boolean algebra is used to simplify or reduce the number of inferences to the minimum set of inferences supported by the data. This reduced set of inferences is termed the "prime implicates" by QCA adherents. For instance, if the presence of conditions A and B is always associated with the presence of a particular value of D, regardless of the observed value of C, then the value that C takes is irrelevant. Thus, all five inferences involving A and B and any of the five values of C may be replaced by the single descriptive inference "(A and B) implies the particular value of D".
+To establish that the prime implicants or descriptive inferences derived from the data by the QCA method are causal requires establishing the existence of causal mechanism using another method such as process tracing, formal logic, intervening variables, or established multidisciplinary knowledge. The method is used in social science and is based on the binary logic of Boolean algebra, and attempts to ensure that all possible combinations of variables that can be made across the cases under investigation are considered.
+
+== Motivation ==
+The technique of listing case types by potential variable combinations assists with case selection by making investigators aware of all possible case types that would need to be investigated, at a minimum, if they exist, in order to test a certain hypothesis or to derive new inferences from an existing data set. In situations where the available observations constitute the entire population of cases, this method alleviates the small N problem by allowing inferences to be drawn by evaluating and comparing the number of cases exhibiting each combination of variables.  The small N problem arises when the number of units of analysis (e.g. countries) available is inherently limited. For example: a study where countries are the unit of analysis is limited in that are only a limited number of countries in the world (less than 200), less than necessary for some (probabilistic) statistical techniques. By maximizing the number of comparisons that can be made across the cases under investigation, causal inferences are according to Ragin possible. This technique allows the identification of multiple causal pathways and interaction effects that may not be detectable via statistical analysis that typically requires its data set to conform to one model. Thus, it is the first step to identifying subsets of a data set conforming to particular causal pathway based on the combinations of covariates prior to quantitative statistical analyses testing conformance to a model; and helps qualitative researchers to correctly limit the scope of claimed findings to the type of observations they analyze.
+
+== Criticism ==
+As this is a logical (deterministic) and not a statistical (probabilistic) technique, with "crisp-set" QCA (csQCA), the original application of QCA, variables can only have two values, which is problematic as the researcher has to determine the values of each variable. For example: GDP per capita has to be divided by the researcher in two categories (e.g. low = 0 and high = 1). But as this variable is essentially a continuous variable, the division will always be arbitrary. A second, related problem is that the technique does not allow an assessment of the effect of the relative strengths of the independent variables (as they can only have two values). Ragin, and other scholars such as Lasse Cronqvist, have tried to deal with these issues by developing new tools that extend QCA, such as multi-value QCA (mvQCA) and fuzzy set QCA (fsQCA). Note: Multi-value QCA is simply QCA applied to observations having categorical variables with more than two values. Crisp-Set QCA can be considered a special case of Multi-value QCA.
+Statistical methodologists have argued that QCA's strong assumptions render its findings both fragile and prone to type I error. Simon Hug argues that deterministic hypotheses and error-free measures are exceedingly rare in social science and uses Monte Carlo simulations to demonstrate the fragility of QCA results if either assumption is violated. Chris Krogslund, Donghyun Danny Choi, and Mathias Poertner further demonstrate that QCA results are highly sensitive to minor parametric and model-susceptibility changes and are vulnerable to type I error. Bear F. Braumoeller further explores the vulnerability of the QCA family of techniques to both type I error and multiple inference. Braumoeller also offers a formal test of the null hypothesis and demonstrates that even very convincing QCA findings may be the result of chance.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Qualitative_comparative_analysis-1.md b/data/en.wikipedia.org/wiki/Qualitative_comparative_analysis-1.md
new file mode 100644
index 000000000..1a3f86594
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Qualitative_comparative_analysis-1.md
@@ -0,0 +1,44 @@
+---
+title: "Qualitative comparative analysis"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Qualitative_comparative_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:03.633787+00:00"
+instance: "kb-cron"
+---
+
+== Response to criticisms ==
+QCA can be performed probabilistically or deterministically with observations of categorical variables. For instance, the existence of a descriptive inference or implication is supported deterministically by the absence of any counter-example cases to the inference; i.e. if a researcher claims condition X implies condition Y, then, deterministically, there must not exist any counterexample cases having condition X, but not condition Y.  However, if the researcher wants to claim that condition X is a probabilistic 'predictor' of condition Y, in another similar set of cases, then the proportion of counterexample cases to an inference to the proportion of cases having that same combination of conditions can be set at a threshold value of for example 80% or higher.  For each prime implicant that QCA outputs via its logical inference reduction process, the "coverage" — percentage out of all observations that exhibit that implication or inference — and the "consistency" — the percentage of observations conforming to that combination of variables having that particular value of the dependent variable or outcome — are calculated and reported, and can be used as indicators of the strength of such an explorative probabilistic inference. In real-life complex societal processes, QCA enables the identification of multiple sets of conditions that are consistently associated with a particular output value in order to explore for causal predictors.
+Fuzzy set QCA aims to handle variables, such as GDP per capita, where the number of categories, decimal values of monetary units, becomes too large to use mvQCA, or in cases where uncertainty, ambiguity or measurement error in the classification of a case needs to be acknowledged.
+
+== Fields of use ==
+QCA has now become used in many more fields than political science which Ragin first developed the method for. Today the method has been used in:
+
+Business (e.g. Romme 1995; Kask and Linton 2013; for a review see Misangyi et al. 2017)
+Information Systems Management (e.g. Lee et al. 2019; for a review see Mattke et al. 2021)
+Project Management (e.g. Invernizzi et al. 2020)
+Human behavior (e.g. Olya and Akhshik 2019)
+Innovation Management (e.g. Sukhov et al. 2018; Aşkun et al. 2021)
+Entrepreneurship (e.g. Linton and Kask 2017)
+Education (e.g. Stevenson 2013)
+Environmental sciences (e.g. Basurto 2013)
+Health research (e.g. Blackman 2013)
+Retailing (e.g. Johansson and Kask 2017)
+Tourism (e.g. Olya & Altinay 2015; Olya & Gavilyan, 2016; Olya & Mehran, 2017; Çizel et al. 2021)
+Political science (e.g. Bara 2014; Binder 2015; Schneider and Maerz 2017)
+
+== See also ==
+Quine–McCluskey algorithm
+CORA - Combinational Regularity Analysis
+Claudius Wagemann
+
+== References ==
+
+== Further reading ==
+Duşa, Adrian (2008-10-01) [September 2007]. "A mathematical approach to the boolean minimization problem". Quality & Quantity. 44: 99–113. doi:10.1007/s11135-008-9183-x. S2CID 123042755. Article number: 99 (2010). [1] (22 pages)
+Duşa, Adrian (2007). "Enhancing Quine-McCluskey" (PDF). University of Bucharest. Archived (PDF) from the original on 2020-05-12. Retrieved 2020-05-12. (16 pages) (NB. QCA, an open source, R based implementation used in the social sciences.)
+Schneider, Q. Carlsten (2024), Set-Theoretic Multi-Method Research: A Guide to Combining QCA and Case Studies, Cambridge University Press. ISBN 978-1-009-30715-4
+
+== External links ==
+COMPASSS (COMPArative Methods for Systematic cross-caSe analySis), a website dedicated to qualitative comparative analysis
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Random_mapping-0.md b/data/en.wikipedia.org/wiki/Random_mapping-0.md
new file mode 100644
index 000000000..b4ad63f09
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Random_mapping-0.md
@@ -0,0 +1,21 @@
+---
+title: "Random mapping"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Random_mapping"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:04.806192+00:00"
+instance: "kb-cron"
+---
+
+For data analysis, Random mapping (RM) is a fast dimensionality reduction method  categorized as feature extraction method. The RM consists in generation of a random matrix that is multiplied by each original vector and result in a reduced vector. When the data vectors are high-dimensional it is computationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute similarities or distances in the original data space. It is therefore necessary to reduce the dimensionality before, for example, clustering the data. In a text mining context, it is demonstrated that the document classification accuracy obtained after the dimensionality has been reduced using a random mapping method will be almost as good as the original accuracy if the final dimensionality is sufficiently large (about 100 out of 6000). In fact, it can be shown that the inner product (similarity) between the mapped vectors follows closely the inner product of the original vectors.
+
+
+== See also ==
+Random variable
+Semantic mapping
+Random projection
+
+
+== References ==
+Kaski, S. Dimensionality reduction by random mapping: fast similarity computation for clustering. Proceedings of  The 1998 IEEE International Joint Conference on Neural Networks, 1998. pp. 413–418. doi: 10.1109/IJCNN.1998.682302
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Real-time_data-0.md b/data/en.wikipedia.org/wiki/Real-time_data-0.md
new file mode 100644
index 000000000..52f20005f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Real-time_data-0.md
@@ -0,0 +1,46 @@
+---
+title: "Real-time data"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Real-time_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:05.986758+00:00"
+instance: "kb-cron"
+---
+
+Real-time data (RTD) is information that is delivered immediately after collection. There is no delay in the timeliness of the information provided. Real-time data is often used for navigation or tracking. Such data is usually processed using real-time computing although it can also be stored for later or off-line data analysis.
+Real-time data is not the same as dynamic data. Real-time data can be dynamic (e.g. a variable indicating current location) or static (e.g. a fresh log entry indicating location at a specific time).
+
+
+== In economics ==
+Real-time economic data, and other official statistics, are often based on preliminary estimates, and therefore are frequently adjusted as better estimates become available. These later adjusted data are called "revised data". 
+The terms real-time economic data and real-time economic analysis were coined by Francis
+X. Diebold and Glenn D. Rudebusch. Macroeconomist Glenn D. Rudebusch defined real-time analysis as 'the use of sequential information sets that were actually available as history unfolded.'  Macroeconomist Athanasios Orphanides has argued that economic policy rules may have very different effects when based on error-prone real-time data (as they inevitably are in reality) than they would if policy makers followed the same rules but had more accurate data available.
+In order to better understand the accuracy of economic data and its effects on economic decisions, some economic organizations, such as the Federal Reserve Bank of St. Louis, Federal Reserve Bank of Philadelphia and the Euro-Area Business Cycle Network (EABCN), have made databases available that contain both real-time data and subsequent revised estimates of the same data.
+
+
+== In auctions ==
+Real-time bidding is programmatic real-time auctions that sell digital-ad impressions. Entities on both the buying and selling sides require almost instantaneous access to data in order to make decisions, forcing real-time data to the forefront of their needs. To support these needs, new strategies and technologies, such Druid have arisen and are quickly evolving.
+
+
+== See also ==
+Datafication
+Data mining
+Geographic information system
+Information privacy
+Management information system
+Online analytical processing
+Personal data service
+Personal Information Agent
+Real-time business intelligence
+Social information processing
+User activity monitoring
+
+
+== References ==
+
+
+== External links ==
+ALFRED: Archival Federal Reserve Economic Data, real-time data series at the Federal Reserve Bank of St. Louis
+Real-time data set for macroeconomists at the Federal Reserve Bank of Philadelphia
+Real-time database of the EABCN
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Real_world_data-0.md b/data/en.wikipedia.org/wiki/Real_world_data-0.md
new file mode 100644
index 000000000..1f527cd6b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Real_world_data-0.md
@@ -0,0 +1,55 @@
+---
+title: "Real world data"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Real_world_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:16.189791+00:00"
+instance: "kb-cron"
+---
+
+Real world data (RWD) in medicine is data derived from a number of sources that are associated with outcomes in a heterogeneous patient population in real-world settings, including but not limited to electronic health records, health insurance claims and patient surveys. While no universal definition of real world data exists, researchers typically understand RWD as distinct from data sourced from randomized clinical trials.
+
+
+== Real world data in healthcare ==
+Real-world data refer to observational data as opposed to data gathered in an experimental setting such as a randomized controlled trial (RCT). They are derived from electronic health records (EHRs), claims and billing activities, product and disease registries, etc. A systematic scoping review of the literature suggests data quality dimensions and methods with RWD is not consistent in the literature, and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data.
+The sources of RWD are only rarely interoperable, as each hospital-maintained EHR system is, by design, secured for patient privacy. Healthcare providers responsible for entering patient data into their EHR may agree to pooling that data with others, once it has been de-identified in accordance with privacy regulations such as HIPAA or GDPR. The result is a larger, more heterogenous population for research, where trends and statistical associations may be more apparent. Results from analysis on aggregated RWD can inform the design of clinical study protocols or advance post-approval research.
+
+
+== Real world evidence ==
+
+When working with RWD, the goal is often to generate evidence. The term real world evidence (RWE) is highly related to RWD. RWE is defined by FDA as "clinical evidence regarding the usage and potential benefits or risks of a medical product derived from analysis of RWD". An example of a study utilizing RWE is "Clinical Features and Outcomes of Coronavirus Disease 2019 Among People Who Have HIV in the United States: A Multi-center Study From a Large Global Health Research Network (TriNetX)" In this study, COVID-19 outcomes were compared between people with HIV and HIV-negative controls from a database of de-identified health records. The TriNetX platform allowed the researchers to consider the HIV and HIV-negative subjects in incidence of hospitalizations, ICU admissions, ventilation and severe disease, to understand the impact COVID-19 infection has on those with HIV.
+Guides for reading and understanding papers that have been written using RWD have been published  
+
+
+== Regional context ==
+
+
+=== US context ===
+In December 2018, the FDA published a framework for Real World Evidence program.
+
+
+=== EU context ===
+In 2018, the EMA published a discussion paper on the use of patient disease registries for regulatory purposes (methodological and operational considerations). In 2022, UK's National Institute for Health and Care Excellence published its RWE Framework that sets out how RWE could inform health technology assessment.
+The use of real-world data from electronic health records and digital health-monitoring devices is also given as an example of general Post-Market Clinical Followup (PMCF) information for medical devices in the guideline "MDCG 2022-21 Guidance on Periodic Safety Update Report  (PSUR) according to Regulation (EU) 2017/745 (MDR)" from December 2022.
+
+
+== See also ==
+21st Century Cures Act (US)
+Correlation does not imply causation
+Qualitative research
+Quantitative research
+Sentinel Initiative
+Pharmacoepidemiology and Pharmacoeconomics
+Health economics and Outcomes research
+
+
+== References ==
+
+
+=== Sources ===
+
+
+== External links ==
+"Real World Evidence" at FDA
+Real world data at TriNetX, LLC
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Real_world_evidence-0.md b/data/en.wikipedia.org/wiki/Real_world_evidence-0.md
new file mode 100644
index 000000000..ef824eb38
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Real_world_evidence-0.md
@@ -0,0 +1,43 @@
+---
+title: "Real world evidence"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Real_world_evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:17.363407+00:00"
+instance: "kb-cron"
+---
+
+Real-world evidence (RWE) in medicine is the clinical evidence regarding the usage and potential benefits or risks of a medical product derived from analysis of real-world data (RWD). RWE can be generated by different study designs or analyses, including but not limited to, randomized trials, including large simple trials, pragmatic trials, and retrospective or prospective observational studies. In the USA the 21st Century Cures Act required the FDA to expand the role of real world evidence.
+Real-world evidence comes into play when clinical trials cannot really account for the entire patient population of a particular disease. Patients with comorbidities or belonging to a distant geographic region or age limit who did not participate in any clinical trial may not respond to the treatment in question as expected. RWE provides answers to these problems and also analyzes the effects of drugs over a longer period of time. Pharmaceutical companies and health insurance payers study RWE to understand patient pathways to deliver appropriate care for appropriate individuals and to minimize their own financial risk by investing in drugs that work for patients.
+
+
+== Data quality ==
+Data quality (DQ) is the degree to which a given dataset meets a user's requirements. In the primary healthcare setting, poor quality data can lead to poor patient care, negatively affect the validity and reproducibility of research results and limit the value that such data may have for public health surveillance.
+In order to use real-world data to generate evidence, data must be of sufficient quality. Kahn et al. define data quality as consisting of three components: (1) conformance (do data values adhere to specified standards and formats?; subtypes: value, relational and computational conformance);  (2) completeness (are data values present?); and (3) plausibility (are data values believable?; subtypes uniqueness, atemporal; temporal). Sometimes, data reliability and data quality are used interchangeably.
+
+
+== Fitness for purpose ==
+Similarly to having sufficient data quality, the real-world data must be fit for purpose. An RWD resource can be fit for addressing some questions, but not others. For example, a dataset that lacks mother-to-baby links may not be appropriate to address drug risk for fetus but can be used for questions for drug safety in patients taking epilepsy treatment (limited to the patient; not including safety for fetus). Since data quality can be evaluated outside a particular purpose (on a general level), fitness for purpose is evaluated separate from data quality and is not included in the concept of data quality.* Real-World Evidence — What Is It and What Can It Tell Us? The New England Journal of Medicine, December 6, 2016.
+
+
+== See also ==
+Evidence-based medicine
+Levels of evidence
+Pragmatic clinical trial
+Qualitative research
+Quantitative research
+Correlation does not imply causation
+DARWIN EU
+
+
+== References ==
+
+
+== External links ==
+"Real World Evidence" at FDA
+"21st Century Cures Act"
+"Use of Real World Data in Development Programmes" at EMA
+"Observational Health Data Sciences and Informatics"
+"Need for Real World Evidence"
+"Real-world evidence: From activity to impact in healthcare decision making"
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Relative_dating-0.md b/data/en.wikipedia.org/wiki/Relative_dating-0.md
new file mode 100644
index 000000000..d9092a7fa
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Relative_dating-0.md
@@ -0,0 +1,41 @@
+---
+title: "Relative dating"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Relative_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:42.146774+00:00"
+instance: "kb-cron"
+---
+
+Relative dating is the science of determining the relative order of past events (i.e., the age of an object in comparison to another), without necessarily determining their absolute age (i.e., estimated age). In geology, rock or superficial deposits, fossils and lithologies can be used to correlate one stratigraphic column with another. Prior to the discovery of radiometric dating in the early 20th century, which provided a means of absolute dating, archaeologists and geologists used relative dating to determine ages of materials. Though relative dating can only determine the sequential order in which a series of events occurred, not when they occurred, it remains a useful technique. Relative dating by biostratigraphy is the preferred method in paleontology and is, in some respects, more accurate. The Law of Superposition, which states that older layers will be deeper in a site than more recent layers, was the summary outcome of 'relative dating' as observed in geology from the 17th century to the early 20th century.
+
+== Geology ==
+The regular order of the occurrence of fossils in rock layers was discovered around 1800 by  William Smith. While digging the Somerset Coal Canal in Southwest England, he found that fossils were always in the same order in the rock layers. As he continued his job as a surveyor, he found the same patterns across England. He also found that certain animals were in only certain layers, and that they were in the same layers all across England. Due to that discovery, Smith was able to recognize the order that the rocks were formed. Sixteen years after his discovery, he published a geological map of England showing the rocks of different geologic time eras.
+
+=== Principles of relative dating ===
+Methods for relative dating were developed when geology first emerged as a natural science in the 18th century. Geologists still use the following principles today as a means to provide information about geologic history and the timing of geologic events.
+
+==== Uniformitarianism ====
+The principle of Uniformitarianism states that the geologic processes observed in operation that modify the Earth's crust at present have worked in much the same way over geologic time. A fundamental principle of geology advanced by the 18th century Scottish physician and geologist James Hutton, is that "the present is the key to the past." In Hutton's words: "the past history of our globe must be explained by what can be seen to be happening now."
+
+==== Intrusive relationships ====
+The principle of intrusive relationships concerns crosscutting intrusions. In geology, when an igneous intrusion cuts across a formation of sedimentary rock, it can be determined that the igneous intrusion is younger than the sedimentary rock. There are a number of different types of intrusions, including stocks, laccoliths, batholiths, sills and dikes.
+
+==== Cross-cutting relationships ====
+
+The principle of cross-cutting relationships pertains to the formation of faults and the age of the sequences through which they cut. Faults are younger than the rocks they cut; accordingly, if a fault is found that penetrates some formations but not those on top of it, then the formations that were cut are older than the fault, and the ones that are not cut must be younger than the fault. Finding the key bed in these situations may help determine whether the fault is a normal fault or a thrust fault.
+
+==== Inclusions and components ====
+The principle of inclusions and components explains that, with sedimentary rocks, if inclusions (or clasts) are found in a formation, then the inclusions must be older than the formation that contains them. For example, in sedimentary rocks, it is common for gravel from an older formation to be ripped up and included in a newer layer. A similar situation with igneous rocks occurs when xenoliths are found. These foreign bodies are picked up as magma or lava flows, and are incorporated, later to cool in the matrix. As a result, xenoliths are older than the rock which contains them.
+
+==== Original horizontality ====
+The principle of original horizontality states that the deposition of sediments occurs as essentially horizontal beds. Observation of modern marine and non-marine sediments in a wide variety of environments supports this generalization (although cross-bedding is inclined, the overall orientation of cross-bedded units is horizontal).
+
+==== Superposition ====
+The law of superposition states that a sedimentary rock layer in a tectonically undisturbed sequence is younger than the one beneath it and older than the one above it. This is because it is not possible for a younger layer to slip beneath a layer previously deposited. The only disturbance that the layers experience is bioturbation, in which animals and/or plants move things in the layers. however, this process is not enough to allow the layers to change their positions. This principle allows sedimentary layers to be viewed as a form of vertical time line, a partial or complete record of the time elapsed from deposition of the lowest layer to deposition of the highest bed.
+
+==== Faunal succession ====
+The principle of faunal succession is based on the appearance of fossils in sedimentary rocks. As organisms exist at the same time period throughout the world, their presence or (sometimes) absence may be used to provide a relative age of the formations in which they are found. Based on principles laid out by William Smith almost a hundred years before the publication of Charles Darwin's theory of evolution, the principles of succession were developed independently of evolutionary thought. The principle becomes quite complex, however, given the uncertainties of fossilization, the localization of fossil types due to lateral changes in habitat (facies change in sedimentary strata), and that not all fossils may be found globally at the same time.
+
+==== Lateral continuity ====
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Relative_dating-1.md b/data/en.wikipedia.org/wiki/Relative_dating-1.md
new file mode 100644
index 000000000..f37fc8209
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Relative_dating-1.md
@@ -0,0 +1,29 @@
+---
+title: "Relative dating"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Relative_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:42.146774+00:00"
+instance: "kb-cron"
+---
+
+The principle of lateral continuity states that layers of sediment initially extend laterally in all directions; in other words, they are laterally continuous. As a result, rocks that are otherwise similar, but are now separated by a valley or other erosional feature, can be assumed to be originally continuous.
+Layers of sediment do not extend indefinitely; rather, the limits can be recognized and are controlled by the amount and type of sediment available and the size and shape of the sedimentary basin. Sediment will continue to be transported to an area and it will eventually be deposited. However, the layer of that material will become thinner as the amount of material lessens away from the source.
+Often, coarser-grained material can no longer be transported to an area because the transporting medium has insufficient energy to carry it to that location. In its place, the particles that settle from the transporting medium will be finer-grained, and there will be a lateral transition from coarser- to finer-grained material. The lateral variation in sediment within a stratum is known as sedimentary facies.
+If sufficient sedimentary material is available, it will be deposited up to the limits of the sedimentary basin. Often, the sedimentary basin is within rocks that are very different from the sediments that are being deposited, in which the lateral limits of the sedimentary layer will be marked by an abrupt change in rock type.
+
+==== Inclusions of igneous rocks ====
+
+Melt inclusions are small parcels or "blobs" of molten rock that are trapped within crystals that grow in the magmas that form igneous rocks. In many respects they are analogous to fluid inclusions. Melt inclusions are generally small – most are less than 100 micrometres across (a micrometre is one thousandth of a millimeter, or about 0.00004 inches). Nevertheless, they can provide an abundance of useful information. Using microscopic observations and a range of chemical microanalysis techniques geochemists and igneous petrologists can obtain a range of useful information from melt inclusions. Two of the most common uses of melt inclusions are to study the compositions of magmas present early in the history of specific magma systems. This is because inclusions can act like "fossils" – trapping and preserving these early melts before they are modified by later igneous processes. In addition, because they are trapped at high pressures many melt inclusions also provide important information about the contents of volatile elements (such as H2O, CO2, S and Cl) that drive explosive volcanic eruptions.
+Sorby (1858) was the first to document microscopic melt inclusions in crystals. The study of melt inclusions has been driven more recently by the development of sophisticated chemical analysis techniques. Scientists from the former Soviet Union lead the study of melt inclusions in the decades after World War II (Sobolev and Kostyuk, 1975), and developed methods for heating melt inclusions under a microscope, so changes could be directly observed.
+Although they are small, melt inclusions may contain a number of different constituents, including glass (which represents magma that has been quenched by rapid cooling), small crystals and a separate vapour-rich bubble. They occur in most of the crystals found in igneous rocks and are common in the minerals quartz, feldspar, olivine and pyroxene. The formation of melt inclusions appears to be a normal part of the crystallization of minerals within magmas, and they can be found in both volcanic and plutonic rocks.
+
+==== Included fragments ====
+The law of included fragments is a method of relative dating in geology. Essentially, this law states that clasts in a rock are older than the rock itself.  One example of this is a xenolith, which is a fragment of country rock that fell into passing magma as a result of stoping. Another example is a derived fossil, which is a fossil that has been eroded from an older bed and redeposited into a younger one.
+This is a restatement of Charles Lyell's original principle of inclusions and components from his 1830 to 1833 multi-volume Principles of Geology, which states that, with sedimentary rocks, if inclusions (or clasts) are found in a formation, then the inclusions must be older than the formation that contains them. For example, in sedimentary rocks, it is common for gravel from an older formation to be ripped up and included in a newer layer. A similar situation with igneous rocks occurs when xenoliths are found. These foreign bodies are picked up as magma or lava flows and are incorporated later to cool in the matrix. As a result, xenoliths are older than the rock which contains them.
+
+=== Planetology ===
+
+Relative dating is used to determine the order of events on Solar System objects other than Earth; for decades, planetary scientists have used it to decipher the development of bodies in the Solar System, particularly in the vast majority of cases for which we have no surface samples. Many of the same principles are applied. For example, if a valley is formed inside an impact crater, the valley must be younger than the crater.
+Craters are very useful in relative dating;  as a general rule, the younger a planetary surface is, the fewer craters it has. If long-term cratering rates are known to enough precision, crude absolute dates can be applied based on craters alone; however, cratering rates outside the Earth-Moon system are poorly known.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Relative_dating-2.md b/data/en.wikipedia.org/wiki/Relative_dating-2.md
new file mode 100644
index 000000000..4aa061e90
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Relative_dating-2.md
@@ -0,0 +1,53 @@
+---
+title: "Relative dating"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Relative_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:42.146774+00:00"
+instance: "kb-cron"
+---
+
+== Ecology ==
+Dating the layers of bird nests made from plastic pollution can be done by checking the expiration date on food packaging. By doing so, a stratigraphy emerges, documenting the history of the breeding site and reflecting all earlier breeding attempts. A nest found in Amsterdam found in 2021 could be traced back all the way to the early 90s. The oldest nest item was a Mars bar advertising the 1994 FIFA World Cup. Single-use packages of perishable products like 'fresh milk' or a 'ripe avocado' proved to be very precise marker in time referring not only to the year but also to the near exact date it was most likely consumed. Non-food or nonperishable, shelf-stable products, could result in less precise dating.
+
+== Archaeology ==
+
+Relative dating methods in archaeology are similar to some of those applied in geology.  The principles of typology  can be compared to the biostratigraphic approach in geology.
+
+== See also ==
+Astronomical chronology
+Age of the Earth
+Age of the universe
+Chronological dating, archaeological chronology
+Absolute dating
+Relative dating, this article
+Phase (archaeology)
+Archaeological association
+Archaeological context
+Archaeological culture – Group of artifact types and structure layouts that often occur together
+Relationship (archaeology)
+Sequence
+Seriation (archaeology) – Archaeological method of relative dating
+Geochronology
+Chronostratigraphy
+Marker horizon
+Thermochronology
+Stratigraphy
+Structural geology
+Unconformity
+Geologic time scale
+Geological history of Earth
+Future of the Earth
+Plate tectonics
+Plate reconstruction
+Timeline of natural history
+List of geochronologic names
+General
+Consilience, evidence from independent, unrelated sources can "converge" on strong conclusions
+
+== References ==
+
+== Citations ==
+"Biostratigraphy: William Smith". Understanding Evolution. 2009. University of California Museum of Paleontology. 23 January 2009 <http://evolution.berkeley.edu/evolibrary/article/0_0_0/history_11>
+Monroe, James S., and Reed Wicander. The Changing Earth: Exploring Geology and Evolution, 2nd ed. Belmont: West Publishing Company, 1997. ISBN 0-314-09577-2
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Remote_experiment-0.md b/data/en.wikipedia.org/wiki/Remote_experiment-0.md
new file mode 100644
index 000000000..e6b5d7ac5
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Remote_experiment-0.md
@@ -0,0 +1,37 @@
+---
+title: "Remote experiment"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Remote_experiment"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:46.151408+00:00"
+instance: "kb-cron"
+---
+
+A remote experiment is a real experiment with real laboratory instruments and equipment that can be controlled by a computer through the internet. One or more remote experiments are accessible in remote laboratory.
+Remotely controlled experiments have become a widespread tool for teaching physics at the university level of education. When executing remote experiments the remote users can change system parameters, observe results in graphical form and/or by video transmission from webcam, and download the experimental results. Sometimes a booking system is available for remote experiments that allows the users to book time for access of remote experiment in advance. User operates remote experiment via graphical user interface. Remote experiments are positively evaluated by the learners.
+
+
+== Advantages of remote experiments ==
+When compared to simulations in virtual laboratories and to experiments in the traditional laboratories, remotely controlled experiments have following advantages:
+
+remote experiments can be carried out from anywhere in the world;
+no time restriction since experiments are available 24 hours a day, 7 days a week;
+overcoming problems with limited laboratory capacity for numerous students;
+safe and secure operation of equipment without danger of user's injury;
+remote experiments can be shared between education institutions as for example in labshare initiative.
+
+
+== Users of remote experiments ==
+Remote experiments are a powerful technology which can be implemented in distance education to provide the learner hands-on experience. Remote experiments can be especially valuable for some groups of users:
+
+learners with physical disabilities, who cannot intend traditional laboratory exercises;
+part-time students, who cannot intend traditional laboratory exercises;
+learners who are undergoing continued education (Lifelong learning) and have to integrate learning activities into their everyday schedule.
+
+
+== References ==
+
+
+== See also ==
+Remote laboratory
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Repeated_measures_design-0.md b/data/en.wikipedia.org/wiki/Repeated_measures_design-0.md
index c2139e99b..97255a66a 100644
--- a/data/en.wikipedia.org/wiki/Repeated_measures_design-0.md
+++ b/data/en.wikipedia.org/wiki/Repeated_measures_design-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/Repeated_measures_design"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:51:42.307330+00:00"
+date_saved: "2026-05-05T09:56:47.372113+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Repeated_measures_design-1.md b/data/en.wikipedia.org/wiki/Repeated_measures_design-1.md
index a0198cddd..6518341a0 100644
--- a/data/en.wikipedia.org/wiki/Repeated_measures_design-1.md
+++ b/data/en.wikipedia.org/wiki/Repeated_measures_design-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/Repeated_measures_design"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:51:42.307330+00:00"
+date_saved: "2026-05-05T09:56:47.372113+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Resistive_plate_chamber-0.md b/data/en.wikipedia.org/wiki/Resistive_plate_chamber-0.md
new file mode 100644
index 000000000..3413d68fa
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Resistive_plate_chamber-0.md
@@ -0,0 +1,14 @@
+---
+title: "Resistive plate chamber"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Resistive_plate_chamber"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:48.534735+00:00"
+instance: "kb-cron"
+---
+
+A Resistive plate chamber (RPC) is a particle detector widely used in high energy physics. They are used for detecting muons in most of the modern experiments including ATLAS, CMS, Belle II and BES III.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Room_temperature-0.md b/data/en.wikipedia.org/wiki/Room_temperature-0.md
new file mode 100644
index 000000000..28c9a7b46
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Room_temperature-0.md
@@ -0,0 +1,42 @@
+---
+title: "Room temperature"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Room_temperature"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:49.758772+00:00"
+instance: "kb-cron"
+---
+
+Room temperature, colloquially, denotes the range of air temperatures most people find comfortable indoors while dressed in typical clothing. Comfortable temperatures can be extended beyond this range depending on humidity, air circulation, and other factors.
+In certain fields, like science and engineering, and within a particular context, room temperature can mean different agreed-upon ranges. In contrast, ambient temperature is the actual temperature, as measured by a thermometer, of the air (or other medium and surroundings) in any particular place. The ambient temperature (e.g. an unheated room in winter) may be very different from an ideal room temperature.
+Food and beverages may be served at "room temperature", meaning neither heated nor cooled.
+
+
+== Comfort temperatures ==
+
+Comfort temperature is interchangeable with neutral temperature in the scientific literature, which can be calculated through regression analysis between thermal sensation votes and indoor temperature. The neutral temperature is the solution of the resulting regression model by setting the thermal sensation vote as zero. The American Heritage Dictionary of the English Language identifies room temperature as around 20–22 °C (68–72 °F; 293–295 K), while the Oxford English Dictionary states that it is "conventionally taken as about 20 °C (68 °F; 293 K)".
+Ideal room temperature varies vastly depending on the surrounding climate . Studies from Indonesia have shown that the range of comfortable temperature is 24–29 °C (75–84 °F) for local residents. Studies from Nigeria show a comfortable temperature range of 26–28 °C (79–82 °F), comfortably cool 24–26 °C (75–79 °F) and comfortably warm 28–30 °C (82–86 °F). A field study conducted in Hyderabad, India returned a comfort band of 26–32.45 °C (79–90 °F) with a mean of 29.23 °C (85 °F). A study conducted in Jaipur, India among healthy young men showed that the neutral thermal comfort temperature was analyzed to be 30.15 °C (86 °F), although a range of 25.9–33.8 °C (79–93 °F) was found.
+People are highly sensitive to even small differences in environmental temperature. At 24 °C (75 °F), a difference of 0.38 °C (0.68 °F) can be detected between the temperature of two rooms. 
+Owing to variations in humidity and (likely) clothing, recommendations for summer and winter may vary; a suggested typical range for summer is 23–25.5 °C (73–78 °F), with that for winter being 20–23.5 °C (68–74 °F). Some studies have suggested that thermal comfort preferences of men and women may differ significantly, with women on average preferring higher ambient temperatures.
+Rooms may be maintained at an ambient temperature above the comfort temperature in hot weather, or below it in cold weather, if required by cost considerations or practical issues (e.g. lack of air conditioning or relatively high expense of heating). In the recent past, it was common for winter house temperatures to be kept below the comfort level; a 1978 UK study found average indoor home temperatures to be 15.8 °C (60.4 °F) while Japan in 1980 had median home temperatures of 13 °C (55 °F) to 15 °C (59 °F).
+
+
+== Health effects ==
+
+The World Health Organization in 1987 found that comfortable indoor temperatures of 18–24 °C (64–75 °F) were not associated with health risks for healthy adults with appropriate clothing, humidity, and other factors. For infants, elderly, and those with significant health problems, a minimum of 20 °C (68 °F) was recommended. Temperatures lower than 16 °C (61 °F) with humidity above 65% were associated with respiratory hazards including allergies.
+The WHO's 2018 guidelines give a strong recommendation that a minimum of 18 °C (64 °F) is a "safe and well-balanced indoor temperature to protect the health of general populations during cold seasons". A higher minimum temperature may be necessary for vulnerable groups including children, the elderly, and people with cardiorespiratory disease and other chronic illnesses. However, the recommendation regarding risk of exposure to high indoor temperatures is only "conditional". Minimal-risk high temperatures range from about 21 to 30 °C (70 to 86 °F) depending on the region, with maximum acceptable temperatures between 25 and 32 °C (77 and 90 °F).
+
+
+== Definitions in science and industry ==
+Temperature ranges are defined as room temperature for certain products and processes in industry, science, standards, and consumer goods. For instance, for the shipping and storage of pharmaceuticals, the United States Pharmacopeia-National Formulary (USP-NF) defines controlled room temperature as between 20 and 25 °C (68 and 77 °F), with excursions between 15 and 30 °C (59 and 86 °F) allowed, provided the mean kinetic temperature does not exceed 25 °C (77 °F). The European Pharmacopoeia defines it as being simply 15 to 25 °C (59 to 77 °F), and the Japanese Pharmacopeia defines "ordinary temperature" as 15 to 25 °C (59 to 77 °F), with room temperature being 1 to 30 °C (34 to 86 °F). Merriam-Webster gives as a medical definition a range of 15 to 25 °C (59 to 77 °F) as being suitable for human occupancy, and at which laboratory experiments are usually performed.
+In physics and chemistry, room temperature usually refers to the ambient temperature in the laboratory; for calculations one frequently assumes 20 °C, 25 °C or 300 K (26.85 °C).
+
+
+== See also ==
+Standard conditions for temperature and pressure
+ISO 1 – ISO standard temperature, 20°C
+Indoor air quality
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Round_Hill_generator-0.md b/data/en.wikipedia.org/wiki/Round_Hill_generator-0.md
new file mode 100644
index 000000000..ee1d441b4
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Round_Hill_generator-0.md
@@ -0,0 +1,25 @@
+---
+title: "Round Hill generator"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Round_Hill_generator"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:50.995002+00:00"
+instance: "kb-cron"
+---
+
+The Round Hill generator is an experimental high-voltage Van de Graaff generator built at Round Hill, Massachusetts. When constructed in 1933, it was designed as the world's most powerful particle accelerator. The generator is now used at the Boston Museum of Science for educational demonstrations.
+The instrument was constructed by a Massachusetts Institute of Technology (MIT) team led by physicist Robert J. Van de Graaff, who hoped to be the first scientist to artificially split the atom. They completed construction a year after John Cockroft and Ernest Walton accomplished the feat in 1932. The machine was the forerunner of high-voltage electrostatic particle accelerators built by the High Voltage Engineering Corporation, which Van de Graaff and his student John G. Trump introduced to cancer clinics and nuclear physics labs around the world.
+Too large to fit in a research lab, the 43-foot-tall generator was assembled in Round Hill's airship hangar. Originally, a technician ran the machine from within one of its metal terminals, which acted as a Faraday cage. It was first demonstrated to the public in November 1933 and dubbed an "electrical Niagara" by the New York Times because of its copious electrical discharges. Coverage in Time, Science, and a review by Nikola Tesla in Scientific American brought fame to its inventor. Designed  to reach potentials of 10 megavolts, challenges with air insulation limited the accelerator to 5.1 megavolts.
+When the research program at Round Hill ended in 1936, the generator was overhauled and installed on MIT's campus in Cambridge, Massachusetts. After two decades of research use at MIT, the Round Hill generator was moved to the Boston Museum of Science in 1955, where it remains operational in the "Theater of Electricity" exhibition.
+
+== History ==
+
+=== Origins of the Van de Graaff generator ===
+The development of Van de Graaff's high-voltage generators began through his graduate studies in Europe, where he encountered many leading physicists of the era. In 1924, Van de Graaff attended radioactivity lectures by Marie Curie. Her use of a clicking Geiger counter and loudspeaker to detect alpha particles captivated Van de Graaff, and he resolved to find ways to study "individual particles" rather than statistical thermodynamics. During a brief stay at the Leiden University, he was encouraged to pursue methods to artificially accelerate particles through conversations with his roommate, Robert Oppenheimer. At Oxford, Van de Graaff was influenced by Ernest Rutherford's 1927 anniversary address to the Royal Society, in which Rutherford expressed his "long-standing ambition to have available...a copious supply of atoms and electrons...transcending in energy the alpha and beta particles from radioactive substances."
+Van de Graaff proposed a simple, inexpensive technique to generating high voltages with electrostatics: attract electrons to a belt, collect them in a metal terminal, then accelerate them towards a target. In 1929, while a National Research Fellow at Princeton University, he constructed a prototype generator, a rudimentary device built from "a tin can, a silk ribbon and a small motor, at no expense." This prototype achieved 80,000 volts but had a fundamental limitation: sharp edges on the tin can created electric field concentrations that caused corona discharge, a form of electrical breakdown that prevented higher voltages.
+At Princeton, Van de Graaff found a champion for his research in physics department chair Karl Taylor Compton, who saw the generator's potential for particle physics. Compton was following parallel particle accelerator projects at the University of Cambridge, where John Cockcroft and Ernest Walton were developing voltage multiplier circuits, and at the University of California, where Ernest Lawrence had proposed the cyclotron. In 1930, MIT recruited Compton as its new president, hoping to transform the engineering school into a premier scientific research institution. Compton, in turn, recruited Van de Graaff in September 1931 to augment MIT's growing physics department.
+Van de Graaff improved on his design by mounting a three-foot polished metal sphere on an insulated column. This approach eliminated the sharp edges of his tin-can prototype that caused electrical breakdown and generated approximately one million volts despite costing less than $100 to build. A demonstration of the generator at the American Institute of Physics inaugural dinner in November 1931 attracted significant attention. It showed that electrostatic generation could achieve voltages far exceeding those available from natural radioactive sources, making the case for a larger project. Back at MIT, Van de Graaff and his collaborators made plans for an air-insulated model to reach 10 megavolts, which would break the record for artificially generated voltages. This would require a terminal 100 times greater in volume.
+
+=== Round Hill installation ===
+In 1926, MIT had established a research program at Round Hill, the estate of Colonel Edward H.R. Green in South Dartmouth, Massachusetts. Green, a technology enthusiast and son of famed investor Hetty Green, agreed to open Round Hill to MIT researchers, expand the existing radio station on the property, and underwrite researchers' operating expenses. The facility was originally directed by Edward L. Bowles, who established research programs in radio communications, fog research, and aircraft navigation.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Round_Hill_generator-1.md b/data/en.wikipedia.org/wiki/Round_Hill_generator-1.md
new file mode 100644
index 000000000..3c4d7cad0
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Round_Hill_generator-1.md
@@ -0,0 +1,29 @@
+---
+title: "Round Hill generator"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Round_Hill_generator"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:50.995002+00:00"
+instance: "kb-cron"
+---
+
+In 1931, Van de Graaff demonstrated a prototype generator at Round Hill. Colonel Green was "delighted" by the generator's spectacular electrical discharges and agreed to Compton's suggestion that a full-scale generator be constructed at the estate. Construction of the large-scale generator began in 1932, with Van de Graaff's equipment and research group moving to Round Hill in August of that year. The high-voltage installation was managed by Lester and Chester van Atta, Edward W. Samson, and Doyle L. Northrup, who operated independently of existing Round Hill research. Needing environmental control, the 43-foot generator was housed in an airship hangar on the estate that had previously been used for a Goodyear dirigible.
+The machine was first demonstrated to the public on November 28, 1933, generating significant scientific and media attention. The generator produced spectacular lightning discharges, which made it a popular attraction for visitors and potential investors. The team conducted regular demonstrations, during which the machine would shoot huge purple bolts of lightning into the rafters of the hangar, creating a thunderous cracking sound. Though visually arresting, the display showed the operating limits of an air-insulated generator. The arcs of lighting again showed the difficulty of sustaining high voltages without breakdown.
+In March 1934, famed electrical engineer Nikola Tesla wrote a cover story on the Round Hill generator for Scientific American, observing that the generator represented "a distinct advance over its predecessors," which included his own Tesla coil. He also expressed skepticism about the project's ultimate goals, writing that "it is highly probable that the attempts to smash the atomic nucleus and to transmute elements will yield results of doubtful value."
+The Round Hill hanger proved a harsh site to make use of the generator's capabilities. Although the components were designed to limit discharges, pigeon droppings on the terminal spheres caused the intense sparking seen by the public. Humidity may have also promoted voltage breakdown in the terminal. Coastal rain and salt fog weakened the paper columns holding up the terminals.  Designed to reach 10 megavolts, the generator's highest recorded voltages reached only half this level.
+Although the voltages were adequate for nuclear distinintegration, no nuclear experiments were completed at Round Hill. In addition to environmental issues, it took almost four years to design an acceleration tube that could control a particle beam. These difficulties at Round Hill were disappointing to Van de Graaff, but did not deter him from further development of the technology.
+
+=== Transfer to MIT's Cambridge campus ===
+Following Colonel Green's death in 1936 and subsequent legal complications over his estate, the Round Hill research program ended. In 1937, the generator was transferred to MIT's Cambridge campus, where it was completely redesigned and reassembled. The move, along with construction of an improved charging belt assembly and remote control system, required "somewhat less than a year."
+The reconfigured generator featured significant improvements over the Round Hill design. The two towers were reassembled as a single conjoined machine and the machine was placed in a new, sealed metal enclosure. The machine was housed within a welded steel shell enclosure with underground rooms for equipment and experiments. This redesign solved several problems: it eased repair and maintenance issues and addressed radiation safety concerns by placing the researchers in a small laboratory within the spheres, shielded from radiation. Targeting could be done in an underground room below the operators. The reconfigured generator operated at a lower but more stable voltage of up to 2.4 megavolts.
+
+=== Transfer to Boston Museum of Science ===
+
+The generator remained in its enclosure at MIT until 1955, when its location was designated a site for a new cyclotron. Karl Compton, in his final years at MIT, suggested that the Round Hill generator could have value in educational demonstrations, just as it had captivated the media on its first public demonstration. MIT transferred the generator to the new building of the Boston Museum of Science, where it became the centerpiece of the museum's Elihu Thomson Theater of Electricity. Originally designed with 150 seats, the theater was enlarged in 1980 and improvements were made to run multiple shows a day.
+
+== Design and operation ==
+
+The Round Hill generator consisted of two separate units, each with a polished aluminum sphere 15 feet in diameter resting on a hollow cylindrical insulating column 25 feet high and 6 feet in diameter. The total height of the spheres above the ground was 43 feet. The columns were made of a material called Textolite, composed of hundreds of thin layers of paper cemented together under high pressure with shellac. The spheres and columns were mounted on heavy, four-wheeled trucks that operated on a railway track 14 feet wide, allowing researchers to vary the distance between the terminals. Each unit with its truck weighed approximately 16 tons.
+Each unit contained endless paper belts operating vertically within the hollow columns, running from driving motors in the bases to pulleys within the spheres. The belts traveled at speeds of up to 5,650 feet per minute. The electrical charge was "sprayed" onto the belts at the base at a comparatively low pressure of 20,000 volts and carried up to the spheres, where it accumulated. One sphere stored negative charges, while the other stored positive charges. A vacuum-insulated accelerator tube was positioned between the spheres.
+Originally designed to produce 10 megavolt potentials, environmental conditions made it difficult to reach this design rating. The maximum voltage achieved by the Round Hill generator was approximately 5.1 million volts between the terminals, with each sphere developing about 3.5 million volts. At full operating capacity, the generator could deliver a charging current of 2.1 milliamperes, with approximately 1.1 milliamperes available for application to an accelerating tube at maximum voltage.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Round_Hill_generator-2.md b/data/en.wikipedia.org/wiki/Round_Hill_generator-2.md
new file mode 100644
index 000000000..d6590ba79
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Round_Hill_generator-2.md
@@ -0,0 +1,27 @@
+---
+title: "Round Hill generator"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Round_Hill_generator"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:50.995002+00:00"
+instance: "kb-cron"
+---
+
+== Technical contributions ==
+The Round Hill generator was primarily designed for nuclear physics research, particularly for the study of atomic nuclei through high-energy particle bombardment. In the early 1930s, this represented the frontier of experimental atomic physics. Ultimately, John Cockcroft and Ernest Walton's 1932 voltage-multiplier circuit produced the first controlled "splitting of the atom." However, electrostatic power offered superior beam control and voltage regulation than contemporaries like the Cockcroft-Walton device, leading to its widespread adoption in nuclear experiments.
+The Round Hill generator incorporated several technical innovations that advanced the state of high-voltage engineering. The generator featured an improved belt-charging systems that provided more stable operation and better voltage control than previous Van de Graaff designs. The vacuum-insulated accelerating tube prompted advances in vacuum technology. The reinstallation in Cambridge also made several innovations that became standard in subsequent accelerator designs. The generator featured one of the first comprehensive remote control systems for high-voltage equipment to manage radiation exposure concerns. Once re-installed in Cambridge, the installation's vacuum technology achieved pressures as low as 6×10−7 mm Hg, with normal operating pressure of 4×10−6 mm Hg during electron beam operation. The system could also accelerate both positive ions and electrons, though positive ion work was limited by outgassing from electrode surfaces.
+The basic concept of Van de Graaff generator was openly published, leading many other physics labs to build models of the generator. However, many of the improvements made during the Round Hill experiment were patented by Van de Graaff, who filed a first patent in December 1931. In a first-of-its-kind partnership, these patents were assigned by MIT to the Research Corporation, which helped to fund the Round Hill installation. Later, these patents were used by Van de Graaff and his protege, John G. Trump, in a company they founded to build particle accelerators for cancer treatment and nuclear science, the High Voltage Engineering Corporation.
+
+== Legacy ==
+Experimental physicist D. Allan Bromley concludes: "Although performance of the Round Hill generator was disappointing to Van de Graaff the experience gained with this machine was invaluable to later designs and most particularly to subsequent work at MIT." The Round Hill generator's design principles influenced subsequent accelerator development and medical applications. In particular, the project demonstrated the feasibility of producing high, controllable voltages through electrostatic transport, leading to widespread adoption of the Van de Graaff accelerator in nuclear physics research.
+While the Round Hill project was underway, Van de Graaff and Trump also worked on methods to making the technology more compact using vacuum insulation and gas insulation. In 1937, Trump developed a smaller Van de Graaff generator operating at 1 million volts for cancer therapy that was installed at Harvard's Huntington Memorial, the first use of the electrostatic accelerator in clinical medicine. By 1936, Trump was leading development of a 2-million-volt medical x-ray generator, described as producing "penetrating short-wave x-rays at a potential of one million volts for medical research and treatment of malignant disease."
+
+After World War II, Trump and Van de Graaff further improved high-voltage electrostatic generators through the High Voltage Engineering Corporation (HVEC), founded in 1946. The company's would eventually produce about 500 high-voltage particle accelerators for research institutions worldwide. Trump remained focused on applications of the technology, while Van de Graaff focused on increasing scale that became primary instruments for nuclear physics research around the world.
+
+== Further reading ==
+MIT Records of the Office of the President, 1930-1959, AC-0004,  MIT Institute Archives
+Robert Jemison Van de Graaff papers, MC-0045, MIT Institute Archives
+Theater of Electricity records, A2022-07-01, Boston Museum of Science
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Rye_Riptides-0.md b/data/en.wikipedia.org/wiki/Rye_Riptides-0.md
new file mode 100644
index 000000000..4faba8302
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Rye_Riptides-0.md
@@ -0,0 +1,22 @@
+---
+title: "Rye Riptides"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Rye_Riptides"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:52.171847+00:00"
+instance: "kb-cron"
+---
+
+Rye Riptides is a boat that was made by a 5th grade class in New Hampshire that was released to the Atlantic Ocean in 2020, and spent 462 days at sea before being discovered in Norway in 2022. The boat was built by two science classes at Rye Junior High School in New Hampshire and launched on October 25, 2020; it was found on February 1, 2022, on a small uninhabited island off the larger island of Smøla.
+
+
+== See also ==
+Friendly Floatees
+
+
+== References ==
+
+
+== External links ==
+Official site
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/SESRI-0.md b/data/en.wikipedia.org/wiki/SESRI-0.md
new file mode 100644
index 000000000..847034b30
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/SESRI-0.md
@@ -0,0 +1,64 @@
+---
+title: "SESRI"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/SESRI"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:10.664517+00:00"
+instance: "kb-cron"
+---
+
+Social and Economic Survey Research Institute (SESRI) is a survey institute and social science research contributor that is part of Qatar University.
+Qatar University realized the importance to enhance its social science research capacity with particular focus on monitoring the reality of national development and trends of its indicators based on evidence. Therefore, QU established SESRI in October 2008. To advance evidence-based decision-making and policy-making, SESRI established Policy Department in 2014.
+Since its commencement, SESRI carried out more than 100 high-quality surveys on Qataris citizens, and also on the enormous expatriate residents of Qatar. Surveys were conducted to guide the public and the government, in line with the Qatar National Vision 2030. All surveys were performed in accordance with the highest scientific and ethical standards.
+
+
+== SESRI Departments ==
+SESRI has three departments:
+
+Survey Operations Support Department
+Research Department
+Policy Department
+
+
+== Accreditation ==
+To ensure commitments to quality and to affirm compliance to international standards, SESRI successfully passed the requirements for ISO 9001:2015 Quality Management Standards from Bureau Veritas.  
+
+
+== Key Roles ==
+Provides consultancy services in survey research and socio-cultural development.
+Advances the evidence-based decision-making and policy-making.
+Capacity building for Qatari nationals in survey research methodology.
+Research topics of interest to Qatar (such as labor markets and migration, and female participation in the labor force).
+Provides reliable and up-to-date scientific data about the challenges and opportunities associated with Qataris between the age of 18 and 29.
+Implement survey projects on social and economic issues.
+Contributes to knowledge that is applied in character.
+Explore strategies and priorities for meeting the UN 2030 Sustainable Development Goals (SDGs) in Qatar.
+
+
+== Preeminent Contributions ==
+Provides strategic analysis for policymakers in key issues such as family, gender relations, and demographic transitions in Qatar.
+Ongoing agricultural census to boost food security strategy in Qatar.
+‘World Mental Health Qatar’s (WMHQ) study to enhance the understanding of the epidemiology of mental illness in Qatar.
+Assessing the extent of ‘Relative poverty’ among lower income Qatari Households.
+Conducted survey for National Human Rights Committee (NHRC) on Civil Society Organizations (CSOs) in Qatar to assist CSOs awareness of the significance of the NHRC.
+
+
+== SESRI Policy Briefs ==
+Using Innovative Technology in the Management of Food Waste in Qatar.
+Greywater for Qatar's Water & Food Security.
+
+
+== Links ==
+Qatar University
+Qatar University Library
+Mariam Al Maadeed
+
+
+== External links ==
+Research and Graduate Studies Office at Qatar University
+Qatar University Newsroom
+National Human Rights Committee (NHRC)
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Scientific_evidence-0.md b/data/en.wikipedia.org/wiki/Scientific_evidence-0.md
index 62ff65ee8..dc66f77b1 100644
--- a/data/en.wikipedia.org/wiki/Scientific_evidence-0.md
+++ b/data/en.wikipedia.org/wiki/Scientific_evidence-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/Scientific_evidence"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:28:53.505176+00:00"
+date_saved: "2026-05-05T09:56:18.593506+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Scientific_evidence-1.md b/data/en.wikipedia.org/wiki/Scientific_evidence-1.md
index bac6e8470..b17da804e 100644
--- a/data/en.wikipedia.org/wiki/Scientific_evidence-1.md
+++ b/data/en.wikipedia.org/wiki/Scientific_evidence-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/Scientific_evidence"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:28:53.505176+00:00"
+date_saved: "2026-05-05T09:56:18.593506+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Security_bag-0.md b/data/en.wikipedia.org/wiki/Security_bag-0.md
new file mode 100644
index 000000000..95afe52e6
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Security_bag-0.md
@@ -0,0 +1,43 @@
+---
+title: "Security bag"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Security_bag"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:19.789021+00:00"
+instance: "kb-cron"
+---
+
+A security bag is a heavy duty bag used to contain high-value products, documents or legally sensitive items. Envelopes with security features are called security envelopes as well as security bags. Cash for deposit in a bank is often placed in a special deposit bag with security features. When they are used to contain items related to a crime, special evidence bags are used. Authentication of signatures and chain of custody are often required.
+
+
+== Construction ==
+Security bags or envelopes may be specially designed plastic bags, paper bags, or fabric bags. Bags or envelopes can be made to be tamper resistant to make it difficult for unauthorized entry. It is often more important for these to be tamper evident so that an unauthorized entry is easily detected to have occurred.
+Bags and envelopes are often closed by an integral pressure sensitive adhesive on the closing flap. The removal of a release liner allows convenient closing of the bag. Several types of security features can be included in the flap structure and are designed to indicate its opening irreversibly.
+Separate security tapes are also used. Tamper-indicating security seals employ a variety of mechanisms for operation, each with its own advantages and disadvantages.
+Documentation such as labels for certified signatures for custody and chain-of-custody labels are frequently included.
+
+
+== Use ==
+No one security feature can be considered as "tamper proof". Layers of tamper-resistant and tamper-evident features, as well as the broader security systems are needed to provide better assurance of security. All security products can be foiled by a knowledgeable person with sufficient time and with access to specialized tools, solvents, extreme temperatures, other security bags, security tapes, etc.
+
+
+== Faraday Bag ==
+When electronic devices, cell phones, media storage, etc. are collected as evidence in a criminal investigation, faraday bags can be used to prevent damage or adulteration.   As a Faraday cage, these have electromagnetic shielding to prevent electronic access to contents.
+Smaller personal faraday bags are used to prevent unwanted access to credit cards, keyfobs, cell phones, etc.
+
+
+== See also ==
+Currency packaging
+Dye pack
+Evidence management
+Provenance
+Package pilferage
+Security seal
+Traceability
+
+
+== Notes ==
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Self-evidence-0.md b/data/en.wikipedia.org/wiki/Self-evidence-0.md
new file mode 100644
index 000000000..dcbb5a635
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Self-evidence-0.md
@@ -0,0 +1,58 @@
+---
+title: "Self-evidence"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Self-evidence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:20.931953+00:00"
+instance: "kb-cron"
+---
+
+In epistemology (theory of knowledge), a self-evident proposition is a proposition that is known to be true by understanding its meaning without proof, and/or by ordinary human reason.
+Some epistemologists deny that any proposition can be self-evident. For most others, one's belief that oneself is conscious and possesses free will are offered as examples of self-evidence. However, one's belief that someone else is conscious or has free will are not epistemically self-evident.
+The following proposition is often said to be self-evident: "A finite whole is greater than, or equal to, any of its parts".
+A logical argument for a self-evident conclusion would demonstrate only an ignorance of the purpose of persuasively arguing for the conclusion based on one or more premises that differ from it (see ignoratio elenchi and begging the question).
+
+
+== Analytic propositions ==
+It is sometimes said that a self-evident proposition is one whose denial is self-contradictory. It is also sometimes said that an analytic proposition is one whose denial is self-contradictory. But the concepts mean different things, i.e., an analytic proposition is not always a self-evident proposition. 
+Provided that one understands and believes a self-evident proposition, self-evident propositions are not in need of proof. Likewise, that their denial is self-contradictory does not need to be proven. It is in this sense that the self-contradictions at work in self-evident and analytic propositions are different.
+Not all analytic propositions are self-evident, and it is sometimes claimed that not all self-evident propositions are analytic: e.g. my knowledge that I am conscious.
+
+
+== Other uses ==
+
+
+=== Informal speech ===
+In informal speech, self-evident often merely means obvious, but the epistemological definition is stricter.
+
+
+=== Moral propositions ===
+Moral propositions may also be regarded as self-evident, although the is–ought problem described by David Hume considers that there is no coherent way to transition from a positive statement to a normative one.
+For example, Alexander Hamilton cited the following moral propositions as self-evident in the Federalist No. 23:
+
+The means ought to be proportioned to the end.
+Every power ought to be commensurate with its object.
+There ought to be no limitation of a power destined to effect a purpose which is itself incapable of limitation.
+A famous claim of the self-evidence of a moral truth is in the United States Declaration of Independence, which states, "We hold these Truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness."; philosophically, these propositions' self-evidence is debatable.
+
+
+=== Mathematics ===
+In mathematics, self-evident refers to statements that need no proof. Sometimes axioms are described as self-evident. Other statements are self-evident because the statement is a proof for itself..
+
+
+== See also ==
+
+2 + 2 = 5 § Self-evident truth and self-evident falsehood
+Axiom
+Contradiction
+Foundationalism
+Introspection
+Law of identity
+Primitive notion
+Self-reference
+Self-refuting idea
+infinite regress
+
+
+== Notes ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Semblance_analysis-0.md b/data/en.wikipedia.org/wiki/Semblance_analysis-0.md
new file mode 100644
index 000000000..142a2d4a2
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Semblance_analysis-0.md
@@ -0,0 +1,86 @@
+---
+title: "Semblance analysis"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Semblance_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:07.132203+00:00"
+instance: "kb-cron"
+---
+
+Semblance analysis is a process used in the refinement and study of seismic data. The use of this technique along with other methods makes it possible to greatly increase the resolution of the data despite the presence of background noise. The new data received following the semblance analysis is usually easier to interpret when trying to deduce the underground structure of an area. Weighted semblance can be used for increasing the resolution of traditional semblance or make traditional semblance capable of analyzing more complicated seismic data.
+
+
+== History ==
+Semblance analysis is a technique that first began to be developed and used in the late 1960s. Prior to the discovery of this method, identifying the main reflections produced by the many layers under the ground was fairly difficult. The primary reflections of these layers were often obscured by the background noise as well as noise from the many secondary reflections that are produced. The use of semblance analysis allows for the removal of the extra noise and leaves only the primary reflection.
+
+
+== Process ==
+
+Semblance analysis allows for the refinement of seismic data. This is done by developing a velocity spectra display to determine the velocity through different layers at depth. The easiest way to accomplish this is by recording the normal incidence path (NIP). The NIP is where you have the shot and the geophone in the same location and the path taken by the recorded sound waves is perpendicular to the boundaries between the layers. This path represents the shortest amount of time that can be taken to reach a layer and return. With this information it becomes fairly easy to calculate the velocity of the waves as they travel through each layer by using the equation for the root mean square velocity starting with the top layer and working downward.
+
+  
+    
+      
+        
+          V
+          
+            
+              r
+              m
+              s
+            
+          
+        
+        =
+        
+          
+            
+              
+                ∑
+                
+                  t
+                  
+                    i
+                  
+                
+                ⋅
+                
+                  
+                    
+                      V
+                      
+                        i
+                      
+                    
+                  
+                  
+                    2
+                  
+                
+              
+              
+                ∑
+                
+                  t
+                  
+                    i
+                  
+                
+              
+            
+          
+        
+      
+    
+    {\displaystyle V_{\mathrm {rms} }={\sqrt {\frac {\sum t_{i}\cdot {V_{i}}^{2}}{\sum t_{i}}}}}
+  
+
+Once all of the velocities for the layers are known then it is possible to calculate the time needed for the wave to travel the distance down to the midpoint between each geophone and the shot point for each of the layers. As the geophones are farther away from the shot, the more the time taken for the wave to travel there increases, this forms a hyperbola in a graph of time vs. distance. The velocity data is used to correct the curves of the hyperbolas and create a flat line where all points are at an equal depth. The final step for the semblance analysis is to sum all of the data that has been corrected for velocity. This is done with the use of a computer filter to sum together all of the events that the traces share, then remove the ones they don't. The result is a single data set that has all of the primary peaks strongly displayed with most of the noise removed.
+
+
+== Problems ==
+While this technique can be very useful in the analysis there are several situations in which it will not work. Semblance analysis will not work properly when the offset from the shot is greater than the depth of the reflecting layers because the data no longer has a hyperbolic pattern. To correct this it is necessary to use more complex equations that model non-hyperbolic moveout. Also in situations where there is large offset there can also be polarity reversals with moveout then the data will be heavily distorted.  To make moveout analysis suitable for data with polarity reversals a method known as AK semblance developed. This method first worked only for 2D models but has since been further refined for 3D as well.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-0.md b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-0.md
new file mode 100644
index 000000000..ae7289940
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-0.md
@@ -0,0 +1,21 @@
+---
+title: "Sequence analysis in social sciences"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:08.316729+00:00"
+instance: "kb-cron"
+---
+
+In social sciences, sequence analysis (SA) is concerned with the analysis of sets of categorical sequences that typically describe longitudinal data. Analyzed sequences are encoded representations of, for example, individual life trajectories such as family formation, school to work transitions, working careers, but they may also describe daily or weekly time use or represent the evolution of observed or self-reported health, of political behaviors, or the development stages of organizations. Such sequences are chronologically ordered unlike words or DNA sequences for example.
+SA is a longitudinal analysis approach that is holistic in the sense that it considers each sequence as a whole. SA is essentially exploratory. Broadly, SA provides a comprehensible overall picture of sets of sequences with the objective of characterizing the structure of the set of sequences, finding the salient characteristics of groups, identifying typical paths, comparing groups, and more generally studying how the sequences are related to covariates such as sex, birth cohort, or social origin.
+Introduced in the social sciences in the 1980s by Andrew Abbott, SA has gained much popularity after the release of dedicated software such as the SQ and SADI addons for Stata and the TraMineR R package with its companions TraMineRextras and WeightedCluster.
+Despite some connections, the aims and methods of SA in social sciences strongly differ from those of sequence analysis in bioinformatics.
+
+== History ==
+Sequence analysis methods were first imported into the social sciences from the information and biological sciences (see Sequence alignment) by the University of Chicago sociologist Andrew Abbott in the 1980s, and they have since developed in ways that are unique to the social sciences. Scholars in psychology, economics, anthropology, demography, communication, political science, learning sciences, organizational studies, and especially sociology have been using sequence methods ever since.
+In sociology, sequence techniques are most commonly employed in studies of patterns of life-course development, cycles, and life histories. There has been a great deal of work on the sequential development of careers, and there is increasing interest in how career trajectories intertwine with life-course sequences. Many scholars have used sequence techniques to model how work and family activities are linked in household divisions of labor and the problem of schedule synchronization within families. The study of interaction patterns is increasingly centered on sequential concepts, such as turn-taking, the predominance of reciprocal utterances, and the strategic solicitation of preferred types of responses (see Conversation Analysis). Social network analysts (see Social network analysis) have begun to turn to sequence methods and concepts to understand how social contacts and activities are enacted in real time, and to model and depict how whole networks evolve. Social network epidemiologists have begun to examine social contact sequencing to better understand the spread of disease. Psychologists have used those methods to study how the order of information affects learning, and to identify structure in interactions between individuals (see Sequence learning).
+Many of the methodological developments in sequence analysis came on the heels of a special section devoted to the topic in a 2000 issue of Sociological Methods & Research, which hosted a debate over the use of the optimal matching (OM) edit distance for comparing sequences. In particular, sociologists objected to the descriptive and data-reducing orientation of optimal matching, as well as to a lack of fit between bioinformatic sequence methods and uniquely social phenomena. The debate has given rise to several methodological innovations (see Pairwise dissimilarities below) that address limitations of early sequence comparison methods developed in the 20th century. In 2006, David Stark and Balazs Vedres proposed the term "social sequence analysis" to distinguish the approach from bioinformatic sequence analysis. However, if we except the nice book by Benjamin Cornwell, the term was seldom used, probably because the context prevents any confusion in the SA literature. Sociological Methods & Research organized a special issue on sequence analysis in 2010, leading to what Aisenbrey and Fasang referred to as the "second wave of sequence analysis", which mainly extended optimal matching and introduced other techniques to compare sequences. Alongside sequence comparison, recent advances in SA concerned among others the visualization of sets of sequence data, the measure and analysis of the discrepancy of sequences, the identification of representative sequences, and the development of summary indicators of individual sequences. Raab and Struffolino have conceived more recent advances as the third wave of sequence analysis. This wave is largely characterized by the effort of bringing together the stochastic and the algorithmic modeling culture by jointly applying SA with more established methods such as analysis of variance, event history analysis, Markovian modeling, social network analysis, or causal analysis and statistical modeling in general.
+
+== Domain-specific theoretical foundation ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-1.md b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-1.md
new file mode 100644
index 000000000..cfc98d79d
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-1.md
@@ -0,0 +1,15 @@
+---
+title: "Sequence analysis in social sciences"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:08.316729+00:00"
+instance: "kb-cron"
+---
+
+=== Sociology ===
+The analysis of sequence patterns has foundations in sociological theories that emerged in the middle of the 20th century.  Structural theorists argued that society is a system that is characterized by regular patterns. Even seemingly trivial social phenomena are ordered in highly predictable ways. This idea serves as an implicit motivation behind social sequence analysts' use of optimal matching, clustering, and related methods to identify common "classes" of sequences at all levels of social organization, a form of pattern search. This focus on regularized patterns of social action has become an increasingly influential framework for understanding microsocial interaction and contact sequences, or "microsequences." This is closely related to Anthony Giddens's theory of structuration, which holds that social actors' behaviors are predominantly structured by routines, and which in turn provides predictability and a sense of stability in an otherwise chaotic and rapidly moving social world. This idea is also echoed in Pierre Bourdieu's concept of habitus, which emphasizes the emergence and influence of stable worldviews in guiding everyday action and thus produce predictable, orderly sequences of behavior. The resulting influence of routine as a structuring influence on social phenomena was first illustrated empirically by Pitirim Sorokin, who led a 1939 study that found that daily life is so routinized that a given person is able to predict with about 75% accuracy how much time they will spend doing certain things the following day. Talcott Parsons's argument that all social actors are mutually oriented to their larger social systems (for example, their family and larger community) through social roles also underlies social sequence analysts' interest in the linkages that exist between different social actors' schedules and ordered experiences, which has given rise to a considerable body of work on synchronization between social actors and their social contacts and larger communities. All of these theoretical orientations together warrant critiques of the general linear model of social reality, which as applied in most work implies that society is either static or that it is highly stochastic in a manner that conforms to Markov processes This concern inspired the initial framing of social sequence analysis as an antidote to general linear models. It has also motivated recent attempts to model sequences of activities or events in terms as elements that link social actors in non-linear network structures This work, in turn, is rooted in Georg Simmel's theory that experiencing similar activities, experiences, and statuses serves as a link between social actors.
+
+=== Demography and historical demography ===
+In demography and historical demography, from the 1980s the rapid appropriation of the life course perspective and methods was part of a substantive paradigmatic change that implied a stronger embedment of demographic processes into social sciences dynamics. After a first phase with a focus on the occurrence and timing of demographic events studied separately from each other with a hypothetico-deductive approach, from the early 2000s the need to consider the structure of the life courses and to make justice to its complexity led to a growing use of sequence analysis with the aim of pursuing a holistic approach. At an inter-individual level, pairwise dissimilarities and clustering appeared as the appropriate tools for revealing the heterogeneity in human development. For example, the meta-narrations contrasting individualized Western societies with collectivist societies in the South (especially in Asia) were challenged by comparative studies revealing the diversity of pathways to legitimate reproduction. At an intra-individual level, sequence analysis integrates the basic life course principle that individuals interpret and make decision about their life according to their past experiences and their perception of contingencies. The interest for this perspective was also promoted by the changes in individuals' life courses for cohorts born between the beginning and the end of the 20th century. These changes have been described as de-standardization, de-synchronization, de-institutionalization. Among the drivers of these dynamics, the transition to adulthood is key: for more recent birth cohorts this crucial phase along individual life courses implied a larger number of events and lengths of the state spells experienced. For example, many postponed leaving parental home and the transition to parenthood, in some context cohabitation replaced marriage as long-lasting living arrangement, and the birth of the first child occurs more frequently while parents cohabit instead of within a wedlock. Such complexity required to be measured to be able to compare quantitative indicators across birth cohorts (see for an extension of this questioning to populations from low- and medium income countries). The demography's old ambition to develop a 'family demography' has found in the sequence analysis a powerful tool to address research questions at the cross-road with other disciplines: for example, multichannel techniques represent precious opportunities to deal with the issue of compatibility between working and family lives. Similarly, more recent combinations of sequence analysis and event history analysis have been developed (see for a review) and can be applied, for instance, for understanding of the link between demographic transitions and health.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-2.md b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-2.md
new file mode 100644
index 000000000..007c5eaf1
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-2.md
@@ -0,0 +1,64 @@
+---
+title: "Sequence analysis in social sciences"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:08.316729+00:00"
+instance: "kb-cron"
+---
+
+=== Political sciences ===
+The analysis of temporal processes in the domain of political sciences regards how institutions, that is, systems and organizations (regimes, governments, parties, courts, etc.) that crystallize political interactions, formalize legal constraints and impose a degree of stability or inertia. Special importance is given to, first, the role of contexts, which confer meaning to trends and events, while shared contexts offer shared meanings; second, to changes over time in power relationships, and, subsequently, asymmetries, hierarchies, contention, or conflict; and, finally, to historical events that are able to shape trajectories, such as elections, accidents, inaugural speeches, treaties, revolutions, or ceasefires. Empirically, political sequences' unit of analysis can be individuals, organizations, movements, or institutional processes. Depending on the unit of analysis, the sample sizes may be limited few cases (e.g., regions in a country when considering the turnover of local political parties over time) or include a few hundreds (e.g., individuals' voting patterns). Three broad kinds of political sequences may be distinguished. The first and most common is careers, that is, formal, mostly hierarchical positions along which individuals progress in institutional environments, such as parliaments, cabinets, administrations, parties, unions or business organizations. We may name trajectories political sequences that develop in more informal and fluid contexts, such as activists evolving across various causes and social movements, or voters navigating a political and ideological landscape across successive polls. Finally, processes relate to non-individual entities, such as: public policies developing through successive policy stages across distinct arenas; sequences of symbolic or concrete interactions between national and international actors in diplomatic and military contexts; and development of organizations or institutions, such as pathways of countries towards democracy (Wilson 2014).
+
+== Concepts ==
+A sequence s is an ordered list of elements (s1,s2,...,sl) taken from a finite alphabet A. For a set S of sequences, three sizes matter: the number n of sequences, the size a = |A| of the alphabet, and the length l of the sequences (that could be different for each sequence). In social sciences, n is generally something between a few hundreds and a few thousands, the alphabet size remains limited (most often less than 20), while sequence length rarely exceeds 100.
+We may distinguish between state sequences and event sequences, where states last while events occur at one time point and do not last but contribute possibly together with other events to state changes. For instance, the joint occurrence of the two events leaving home and starting a union provoke a state change from 'living at home with parents' to 'living with a partner'.
+When a state sequence is represented as the list of states observed at the successive time points, the position of each element in the sequence conveys this time information and the distance between positions reflects duration. An alternative more compact representation of a sequence, is the list of the successive spells stamped with their duration, where a spell (also called episode) is a substring in a same state. For example, in aabbbc, bbb is a spell of length 3 in state b, and the whole sequence can be represented as (a,2)-(b,3)-(c,1).
+
+A crucial point when looking at state sequences is the timing scheme used to time align the sequences. This could be the historical calendar time, or a process time such as age, i.e. time since birth.
+In event sequences, positions do not convey any time information. Therefore event occurrence time must be explicitly provided (as a timestamp) when it matters.
+SA is essentially concerned with state sequences.
+
+== Methods ==
+Conventional SA consists essentially in building a typology of the observed trajectories. Abbott and Tsay (2000) describe this typical SA as a three-step program: 1. Coding individual narratives as sequences of states; 2. Measuring pairwise dissimilarities between sequences; and 3. Clustering the sequences from the pairwise dissimilarities. However, SA is much more (see e.g.) and encompasses also among others the description and visual rendering of sets of sequences, ANOVA-like analysis and regression trees for sequences, the identification of representative sequences, the study of the relationship between linked sequences (e.g. dyadic, linked-lives, or various life dimensions such as occupation, family, health), and sequence-network.
+
+=== Describing and rendering state sequences ===
+Given an alignment rule, a set of sequences can be represented in tabular form with sequences in rows and columns corresponding to the positions in the sequences.
+
+==== Sequences of cross-sectional distributions ====
+To describe such data, we may look at the columns and consider the cross-sectional state distributions at the successive positions.
+The chronogram or density plot of a set of sequences renders these successive cross-sectional distributions.
+
+For each (column) distribution we can compute characteristics such as entropy or modal state and look at how these values evolve over the positions (see  pp 18–21).
+
+==== Characteristics of individual sequences ====
+Alternatively, we can look at the rows. The index plot where each sequence is represented as a horizontal stacked bar or line is the basic plot for rendering individual sequences.
+
+We can compute characteristics of the individual sequences and examine the cross-sectional distribution of these characteristics.
+Main indicators of individual sequences
+
+Basic measures
+Length
+Number of states visited
+Number of transitions (length of sequence of distinct successive states, DSS)
+Number of subsequences
+Recurrence
+Diversity
+Within sequence entropy
+Variance of spell duration
+Complexity of the sequence structure
+Volatility
+Complexity index
+Turbulence
+Measures that take account of the nature of the states
+Normative volatility i.e. proportion of positive spells.
+Integration index also known as Quality index
+Degradation
+Badness
+Precarity index
+Insecurity
+
+==== Other overall descriptive measures ====
+Mean time in the different states (overall state distribution) and their standard errors
+Transition probabilities between states.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-3.md b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-3.md
new file mode 100644
index 000000000..9205e2bfb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-3.md
@@ -0,0 +1,99 @@
+---
+title: "Sequence analysis in social sciences"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:08.316729+00:00"
+instance: "kb-cron"
+---
+
+=== Visualization ===
+State sequences can nicely be rendered graphically and such plots prove useful for interpretation purposes. As shown above, the two basic plots are the index plot that renders individual sequences and the chronogram that renders the evolution of the cross-sectional state distribution along the timeframe. Chronograms (also known as status proportion plot or state distribution plot) completely overlook the diversity of the sequences, while index plots are often too scattered to be readable. Relative frequency plots and plots of representative sequences attempt to increase the readability of index plots without falling in the oversimplification of a chronogram. In addition, there are many plots that focus on specific characteristics of the sequences. Below is a list of plots that have been proposed in the literature for rendering large sets of sequences. For each plot, we give examples of software (details in section Software) that produce it.
+
+Index plot: renders the set of individual sequences (SADI, SQ, TraMineR)
+Chronogram (status proportion plot, state distribution plot): renders the sequence of cross-sectional distributions (SADI, SQ, TraMineR)
+Plot of multidomain/multichannel sequences grouped by channels (TraMineR, seqHMM) or by individuals
+Plot of time series of cross-sectional indicators (entropy, modal state, ...) (SQ, TraMineR)
+Frequency plot (SQ, TraMineR)
+Relative frequency plot (TraMineR)
+Representative sequences (TraMineR)
+Mean time in the different states and their standard errors (TraMineR)
+State survival plot (TraMineRextras)
+Position-wise group typical states, i.e., with highest implication strength (TraMineRextras)
+Transition patterns (SADI)
+Transition plot (SQ; Gmisc) and plot of transition probabilities (seqHMM)
+Parallel coordinate plot (TraMineR, SQ)
+Probabilistic suffix trees (PST)
+Sequence networks (see social network analysis, Social network analysis software)
+Narrative networks (Software?)
+
+=== Pairwise dissimilarities ===
+Pairwise dissimilarities between sequences serve to compare sequences and many advanced SA methods are based on these dissimilarities. The most popular dissimilarity measure is optimal matching (OM), i.e. the minimal cost of transforming one sequence into the other by means of indel (insert or delete) and substitution operations with possibly costs of these elementary operations depending on the states involved. SA is so intimately linked with OM that it is sometimes named optimal matching analysis (OMA).
+There are roughly three categories of dissimilarity measures:
+
+Optimal matching and other edit distances
+Examples: OM, OMloc (localized OM),  OMslen (spell-length sensitive OM), OMspell (OM of spell sequences), OMstran (OM of sequences of transitions), TWED (time-warp edit distance), HAM (Hamming and generalized Hamming), DHD (Dynamic Hamming).
+Strategies for setting the substitution and indel costs
+Constant costs (all substitution costs identical and single indel cost)
+Theory-based costs
+Feature-based costs
+Data-driven costs: based on transition probabilities or state frequencies
+Measures based on the count of common attributes
+Examples: LCS (derived from length of longest common subsequence), LCP (from length of longest common prefix), NMS (number of matching subsequences), and NMSMST and SVRspell two variants of NMS.
+Distances between within-sequence state distributions
+Examples: CHI2 and EUCLID defined as the average of respectively the Chi-squared and Euclidean distance between state distributions in successive sliding windows.
+
+=== Dissimilarity-based analysis ===
+Pairwise dissimilarities between sequences give access to a series of techniques to discover holistic structuring characteristics of the sequence data. In particular, dissimilarities between sequences can serve as input to cluster algorithms and multidimensional scaling, but also allow to identify medoids or other representative sequences, define neighborhoods, measure the discrepancy of a set of sequences, proceed to ANOVA-like analyses, and grow regression trees.
+
+Cluster analysis
+Descriptive: identification of main sequence patterns.
+Clusters as dependent or independent variables in regression analysis: study of relationships with other variables of interest.
+Multidimensional scaling (principal coordinates): numerical representation of sequences.
+Discrepancy (ANOVA-like) analysis
+Sequence of ANOVA-like analyses
+Regression trees
+Representative sequences
+Outliers and deviant sequences
+Multiple domains (multichannel analysis)
+Dyadic and polyadic sequence data
+
+=== Other methods of analysis ===
+Although dissimilarity-based methods play a central role in social SA, essentially because of their ability to preserve the holistic perspective, several other approaches also prove useful for analyzing sequence data.
+
+Non dissimilarity-based clustering
+Latent class analysis (LCA),
+Markov model mixture and hidden Markov model mixture
+Mixtures of exponential-distance models
+Sequence networks
+Representing a single sequence as a network
+Meta network of sequences
+Sequence network measures
+Life history graph
+Probabilistic approaches
+Markovian and other transition distribution models. See also Markov model.
+Probabilistic Suffix Tree (PST) also known as variable-order Markov model or variable-length Markov model.
+Event sequences
+Event structure models
+Rendering of event sequences (parallel coordinate plots, ...)
+Frequent subsequences
+Discriminant subsequences
+Dissimilarity-based analysis of event sequences
+Representation learning with deep neural networks
+
+=== Advances: the third wave of sequence analysis ===
+Some recent advances can be conceived as the third wave of SA. This wave is largely characterized by the effort of bringing together the stochastic and the algorithmic modeling culture by jointly applying SA with more established methods such as analysis of variance, event history, network analysis, or causal analysis and statistical modeling in general. Some examples are given below; see also "Other methods of analysis".
+
+Effect of past trajectories on the hazard of an event: Sequence History Analysis, SHA
+Effect of time varying covariates on trajectories: Competing Trajectories Analysis (CTA), and Sequence Analysis Multistate Model (SAMM)
+Validation of cluster typologies
+Discrepancy analysis to bring time back to qualitative comparative analysis (QCA)
+
+=== Open issues and limitations ===
+Although SA witnesses a steady inflow of methodological contributions that address the issues raised two decades ago, some pressing open issues remain. Among the most challenging, we can mention:
+
+Sequences of different lengths, truncated sequences, and missing values.
+Validation of cluster results
+Sequence length vs importance of recency: for example, when analyzing biographic sequences 40 year-long from age 1 to 40, one can only consider individuals born 40 years earlier and therefore the behavior of younger birth cohorts is disregarded.
+Up-to-date information on advances, methodological discussions, and recent relevant publications can be found on the Sequence Analysis Association webpage.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-4.md b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-4.md
new file mode 100644
index 000000000..3465edcfb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences-4.md
@@ -0,0 +1,71 @@
+---
+title: "Sequence analysis in social sciences"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Sequence_analysis_in_social_sciences"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:08.316729+00:00"
+instance: "kb-cron"
+---
+
+== Fields of application ==
+These techniques have proved valuable in a variety of contexts. In life-course research, for example, research has shown that retirement plans are affected not just by the last year or two of one's life, but instead how one's work and family careers unfolded over a period of several decades. People who followed an "orderly" career path (characterized by consistent employment and gradual ladder-climbing within a single organization) retired earlier than others, including people who had intermittent careers, those who entered the labor force late, as well as those who enjoyed regular employment but who made numerous lateral moves across organizations throughout their careers. In the field of economic sociology, research has shown that firm performance depends not just on a firm's current or recent social network connectedness, but also the durability or stability of their connections to other firms. Firms that have more "durably cohesive" ownership network structures attract more foreign investment than less stable or poorly connected structures. Research has also used data on everyday work activity sequences to identify classes of work schedules, finding that the timing of work during the day significantly affects workers' abilities to maintain connections with the broader community, such as through community events. More recently, social sequence analysis has been proposed as a meaningful approach to study trajectories in the domain of creative enterprise, allowing the comparison among the idiosyncrasies of unique creative careers. While other methods for constructing and analyzing whole sequence structure have been developed during the past three decades, including event structure analysis, OM and other sequence comparison methods form the backbone of research on whole sequence structures.
+Some examples of application include:
+Sociology
+
+Labor market entry sequences
+De-standardization of the life course
+Residential trajectories
+Time use
+Actual and idealized relationship scripts
+Basic types of figures in ritual dances
+Pathways of alcohol consumption
+Demography and historical demography
+
+Transition to adulthood
+Partnership biographies
+Family formation life course
+Childbirth histories
+Political sciences
+
+Pathways towards democratization
+Pathways of legislative processes
+Bargaining between actors during national crises
+Education and learning sciences
+
+Study trajectories
+Learning strategies
+Psychology
+
+Sequences of adolescences' social interactions
+Medical research
+
+Care trajectory in chronic disease
+Survey methodology
+
+Response in survey collection
+Geography
+
+Mobility studies
+Regional development
+Land use
+
+== Software ==
+Two main statistical computing environment and one general programming language offer tools to conduct a sequence analysis in the form of user-written packages: Stata, R, and Python.
+
+Stata: SQ and SADI are general SA toolkits. MICT is dedicated to imputation of missing elements in sequences.
+R: TraMineR with its extension TraMineRextras is probably the most comprehensive SA toolkit; ggseqplot, provides ggplot versions of most TraMineR plots; seqhandbook provides several specific tools such as heat maps of sequence data and the GIMSA method for measuring dissimilarities between multidomain sequences; seqimpute provides tools for imputing missing elements in sequences; seqHMM, although specialized in fitting Markov models, this package provides useful plotting facilities for rendering multichannel sequences and transition probabilities; WeightedCluster versatile clustering package with original tools for grouping identical sequences and rendering hierarchical trees of sequences; PST fits and renders probabilistic suffix trees of sequences.
+Python: Sequenzo SA toolkit proposing a Python implementation of several functionalities of TraMineR, WeightedCluster, and seqHMM.
+
+== Institutional development ==
+The first international conference dedicated to social-scientific research that uses sequence analysis methods – the Lausanne Conference on Sequence Analysis, or LaCOSA – was held in Lausanne, Switzerland in June 2012. A second conference (LaCOSA II) was held in Lausanne in June 2016. The Sequence Analysis Association (SAA) was founded at the International Symposium on Sequence Analysis and Related Methods, in October 2018 at Monte Verità, TI, Switzerland. The SAA is an international organization whose goal is to organize events such as symposia and training courses and related events, and to facilitate scholars' access to sequence analysis resources.
+
+== See also ==
+
+== References ==
+
+== External links ==
+The homepage of the Sequence Analysis Association.
+[1]Andrew Abbott's 1995 review of sociological approaches to sequence analysis.
+The TraMineR page
+Brendan Halpin's sequence analysis page at the University of Limerick.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Seriation_(statistics)-0.md b/data/en.wikipedia.org/wiki/Seriation_(statistics)-0.md
new file mode 100644
index 000000000..5b5c8229b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Seriation_(statistics)-0.md
@@ -0,0 +1,14 @@
+---
+title: "Seriation (statistics)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Seriation_(statistics)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:09.506205+00:00"
+instance: "kb-cron"
+---
+
+In combinatorial data analysis, seriation is the process of finding an arrangement of all objects in a set, in a linear order, given a loss function. The main goal is exploratory, to reveal structural information.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Single-particle_trajectory-0.md b/data/en.wikipedia.org/wiki/Single-particle_trajectory-0.md
new file mode 100644
index 000000000..992bb7ed2
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Single-particle_trajectory-0.md
@@ -0,0 +1,372 @@
+---
+title: "Single-particle trajectory"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Single-particle_trajectory"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:11.886018+00:00"
+instance: "kb-cron"
+---
+
+Single-particle trajectories (SPTs) consist of a collection of successive discrete points causal in time.  These trajectories  are acquired from images in experimental data. In the context of cell biology, the trajectories are obtained by the transient activation by a laser of small dyes attached to a moving molecule.
+Molecules can now by visualized based on recent super-resolution microscopy, which allow routine collections of thousands of short and long trajectories. These trajectories explore part of a cell, either on the membrane or in 3 dimensions and their paths are critically influenced by the local crowded organization and molecular interaction inside the cell, as emphasized in various cell types such as neuronal cells, astrocytes, immune cells and many others.
+
+== SPTs allow observing moving molecules inside cells to collect statistics ==
+SPT allowed observing moving particles. These trajectories are used to investigate cytoplasm or membrane organization, but also the cell nucleus dynamics, remodeler dynamics or mRNA production. Due to the constant improvement of the instrumentation, the spatial resolution is continuously decreasing, reaching now values of approximately  20 nm, while the acquisition time step is usually in the range of 10 to 50 ms to capture short events occurring in live tissues. A variant of super-resolution microscopy called sptPALM is used to detect the local and dynamically changing organization of molecules in cells, or events of DNA binding by transcription factors in mammalian nucleus. Super-resolution image acquisition and particle tracking are crucial to guarantee a high quality data
+
+== Assembling points into a trajectory based on tracking algorithms ==
+Once points are acquired, the next step is to reconstruct a trajectory. This step is done known tracking algorithms to connect the acquired points.  Tracking algorithms are based on a physical model of trajectories perturbed by an additive random noise.
+
+== Extract physical parameters from redundant SPTs ==
+The redundancy of many short (SPTs) is a key feature to extract biophysical information parameters from empirical data at a molecular level. In contrast, long isolated trajectories have been used to extract information along trajectories, destroying the natural spatial heterogeneity associated to the various positions.  The main statistical tool is to compute the mean-square displacement (MSD) or second order statistical moment:
+
+  
+    
+      
+        ⟨
+        
+          |
+        
+        X
+        (
+        t
+        +
+        Δ
+        t
+        )
+        −
+        X
+        (
+        t
+        )
+        
+          
+            |
+          
+          
+            2
+          
+        
+        ⟩
+        ∼
+        
+          t
+          
+            α
+          
+        
+      
+    
+    {\displaystyle \langle |X(t+\Delta t)-X(t)|^{2}\rangle \sim t^{\alpha }}
+  
+ (average over realizations), where 
+  
+    
+      
+        α
+      
+    
+    {\displaystyle \alpha }
+  
+ is the called the anomalous exponent.
+For a Brownian motion, 
+  
+    
+      
+        ⟨
+        
+          |
+        
+        X
+        (
+        t
+        +
+        Δ
+        t
+        )
+        −
+        X
+        (
+        t
+        )
+        
+          
+            |
+          
+          
+            2
+          
+        
+        ⟩
+        =
+        2
+        n
+        D
+        t
+      
+    
+    {\displaystyle \langle |X(t+\Delta t)-X(t)|^{2}\rangle =2nDt}
+  
+, where D is the diffusion coefficient, n is dimension of the space.  Some other properties can also be recovered from long trajectories, such as the radius of confinement for a confined motion.  The MSD has been widely used in early applications of long but not necessarily redundant single-particle trajectories in a biological context. However, the MSD applied to long trajectories suffers from several issues. First, it is not precise in part because the measured points could be correlated. Second, it cannot be used to compute any physical diffusion coefficient when trajectories consists of switching episodes for example alternating between free and confined diffusion. At low spatiotemporal resolution of the observed trajectories, the MSD behaves sublinearly with time, a process known as anomalous diffusion, which is due in part to the averaging of the different phases of the particle motion. In the context of cellular transport (ameoboid), high resolution motion analysis of long SPTs in micro-fluidic chambers containing obstacles revealed different types of cell motions. Depending on the obstacle density: crawling was found at low density of obstacles and directed motion and random phases can even be differentiated.
+
+== Physical model to recover spatial properties from redundant SPTs ==
+
+=== Langevin and Smoluchowski equations as a model of motion ===
+Statistical methods to extract information from SPTs are based on stochastic models, such as the Langevin equation or its Smoluchowski's limit and associated models that account for additional localization point identification noise or memory kernel. The Langevin equation describes a stochastic particle driven by a Brownian force 
+  
+    
+      
+        Ξ
+      
+    
+    {\displaystyle \Xi }
+  
+ and a field of force (e.g., electrostatic, mechanical, etc.) with an expression 
+  
+    
+      
+        F
+        (
+        x
+        ,
+        t
+        )
+      
+    
+    {\displaystyle F(x,t)}
+  
+:
+
+  
+    
+      
+        m
+        
+          
+            
+              x
+              ¨
+            
+          
+        
+        +
+        Γ
+        
+          
+            
+              x
+              ˙
+            
+          
+        
+        −
+        F
+        (
+        x
+        ,
+        t
+        )
+        =
+        Ξ
+        ,
+      
+    
+    {\displaystyle m{\ddot {x}}+\Gamma {\dot {x}}-F(x,t)=\Xi ,}
+  
+
+where m is the mass of the particle and 
+  
+    
+      
+        Γ
+        =
+        6
+        π
+        a
+        ρ
+      
+    
+    {\displaystyle \Gamma =6\pi a\rho }
+  
+ is the friction coefficient of a diffusing particle, 
+  
+    
+      
+        ρ
+      
+    
+    {\displaystyle \rho }
+  
+ the viscosity. Here 
+  
+    
+      
+        Ξ
+      
+    
+    {\displaystyle \Xi }
+  
+ is the 
+  
+    
+      
+        δ
+      
+    
+    {\displaystyle \delta }
+  
+-correlated Gaussian white noise. The force can derived from a potential well U so that 
+  
+    
+      
+        F
+        (
+        x
+        ,
+        t
+        )
+        =
+        −
+        
+          U
+          ′
+        
+        (
+        x
+        )
+      
+    
+    {\displaystyle F(x,t)=-U'(x)}
+  
+ and in that case, the equation takes the form
+
+  
+    
+      
+        m
+        
+          
+            
+              
+                d
+                
+                  2
+                
+              
+              x
+            
+            
+              d
+              
+                t
+                
+                  2
+                
+              
+            
+          
+        
+        +
+        Γ
+        
+          
+            
+              d
+              x
+            
+            
+              d
+              t
+            
+          
+        
+        +
+        ∇
+        U
+        (
+        x
+        )
+        =
+        
+          
+            2
+            ε
+            γ
+          
+        
+        
+        
+          
+            
+              d
+              η
+            
+            
+              d
+              t
+            
+          
+        
+        ,
+      
+    
+    {\displaystyle m{\frac {d^{2}x}{dt^{2}}}+\Gamma {\frac {dx}{dt}}+\nabla U(x)={\sqrt {2\varepsilon \gamma }}\,{\frac {d\eta }{dt}},}
+  
+
+where 
+  
+    
+      
+        ε
+        =
+        
+          k
+          
+            B
+          
+        
+        T
+        ,
+      
+    
+    {\displaystyle \varepsilon =k_{\text{B}}T,}
+  
+ is  the energy and 
+  
+    
+      
+        
+          k
+          
+            B
+          
+        
+      
+    
+    {\displaystyle k_{\text{B}}}
+  
+ the Boltzmann constant and T the temperature. Langevin's equation is used to describe trajectories where inertia or acceleration matters. For example, at very short timescales, when a molecule unbinds from a binding site or escapes from a potential well and the inertia term allows the particles to move away from the attractor and thus prevents immediate rebinding that could plague numerical simulations.
+In the large friction limit 
+  
+    
+      
+        γ
+        →
+        ∞
+      
+    
+    {\displaystyle \gamma \to \infty }
+  
+ the trajectories 
+  
+    
+      
+        x
+        (
+        t
+        )
+      
+    
+    {\displaystyle x(t)}
+  
+ of the Langevin equation converges in probability to those  of the Smoluchowski's equation
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Single-particle_trajectory-1.md b/data/en.wikipedia.org/wiki/Single-particle_trajectory-1.md
new file mode 100644
index 000000000..a8337a3d1
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Single-particle_trajectory-1.md
@@ -0,0 +1,612 @@
+---
+title: "Single-particle trajectory"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Single-particle_trajectory"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:11.886018+00:00"
+instance: "kb-cron"
+---
+
+  
+    
+      
+        γ
+        
+          
+            
+              x
+              ˙
+            
+          
+        
+        +
+        
+          U
+          
+            ′
+          
+        
+        (
+        x
+        )
+        =
+        
+          
+            2
+            ε
+            γ
+          
+        
+        
+        
+          
+            
+              w
+              ˙
+            
+          
+        
+        ,
+      
+    
+    {\displaystyle \gamma {\dot {x}}+U^{\prime }(x)={\sqrt {2\varepsilon \gamma }}\,{\dot {w}},}
+  
+
+where 
+  
+    
+      
+        
+          
+            
+              w
+              ˙
+            
+          
+        
+        (
+        t
+        )
+      
+    
+    {\displaystyle {\dot {w}}(t)}
+  
+ is 
+  
+    
+      
+        δ
+      
+    
+    {\displaystyle \delta }
+  
+-correlated. This equation is obtained when the diffusion coefficient is constant in space. When this is not case, coarse grained equations (at a coarse spatial resolution) should be derived from molecular considerations.  Interpretation of the physical forces are not resolved by Ito's vs Stratonovich integral representations or any others.
+
+=== General model equations ===
+For a timescale much longer than the elementary molecular collision, the position of a tracked particle is described by a more general overdamped limit of the Langevin stochastic model. Indeed, if the acquisition timescale of empirical recorded trajectories is much lower compared to the thermal fluctuations, rapid events are not resolved in the data. Thus at this coarser spatiotemporal scale, the motion description is replaced  by an effective stochastic equation
+
+  
+    
+      
+        
+          
+            
+              X
+              ˙
+            
+          
+        
+        (
+        t
+        )
+        =
+        
+          b
+        
+        (
+        X
+        (
+        t
+        )
+        )
+        +
+        
+          
+            2
+          
+        
+        
+          
+            B
+          
+          
+            e
+          
+        
+        (
+        X
+        (
+        t
+        )
+        )
+        
+          
+            
+              w
+              ˙
+            
+          
+        
+        (
+        t
+        )
+        ,
+        
+        
+        (
+        1
+        )
+      
+    
+    {\displaystyle {\dot {X}}(t)={b}(X(t))+{\sqrt {2}}{B}_{e}(X(t)){\dot {w}}(t),\qquad \qquad (1)}
+  
+
+where 
+  
+    
+      
+        
+          b
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle {b}(X)}
+  
+  is the drift field and 
+  
+    
+      
+        
+          
+            B
+          
+          
+            e
+          
+        
+      
+    
+    {\displaystyle {B}_{e}}
+  
+ the diffusion matrix. The effective diffusion tensor can vary in space  
+  
+    
+      
+        D
+        (
+        X
+        )
+        =
+        
+          
+            1
+            2
+          
+        
+        B
+        (
+        X
+        )
+        
+          B
+          
+            T
+          
+        
+        
+          X
+          
+            T
+          
+        
+      
+    
+    {\displaystyle D(X)={\frac {1}{2}}B(X)B^{T}X^{T}}
+  
+ (
+  
+    
+      
+        
+          X
+          
+            T
+          
+        
+      
+    
+    {\textstyle X^{T}}
+  
+ denotes the transpose of 
+  
+    
+      
+        X
+      
+    
+    {\textstyle X}
+  
+).  This equation is not derived but assumed. However the diffusion coefficient should be smooth enough as any discontinuity in D should be resolved by a spatial scaling to analyse the source of discontinuity (usually inert obstacles or transitions between two medias).  The observed effective diffusion tensor is not necessarily isotropic and can be state-dependent, whereas the friction coefficient 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+ remains constant as long as the medium stays the same and the microscopic diffusion coefficient (or tensor) could remain isotropic.
+
+== Statistical analysis of these trajectories ==
+The development of statistical methods are based on stochastic models, a possible deconvolution procedure applied to the trajectories. Numerical simulations could also be used to identify specific features that could be extracted  from single-particle trajectories data. The goal of building a statistical ensemble from SPTs data is to observe local physical properties of the particles, such as velocity, diffusion, confinement or attracting forces reflecting the interactions of the particles with their local nanometer environments. It is possible to use stochastic modeling to construct from diffusion coefficient (or tensor) the confinement or local density of obstacles reflecting the presence of biological objects of different sizes.
+
+=== Empirical estimators for the drift and diffusion tensor of a stochastic process ===
+Several empirical estimators have been proposed to recover the local diffusion coefficient, vector field and even organized patterns in the drift, such as potential wells. The construction of empirical estimators that serve to recover physical properties from parametric and non-parametric statistics.  Retrieving statistical parameters of a diffusion process from one-dimensional time series statistics use the first moment estimator or Bayesian inference.
+The models and the analysis assume that processes are stationary, so that the statistical properties of trajectories do not change over time. In practice, this assumption is satisfied when trajectories are acquired for less than a minute, where only few slow changes may occur on the surface of a neuron for example.  Non stationary behavior are  observed using a time-lapse analysis, with a delay of tens of minutes between successive acquisitions.
+The coarse-grained model Eq. 1 is recovered from the conditional moments of the trajectory by computing the  increments 
+  
+    
+      
+        Δ
+        X
+        =
+        X
+        (
+        t
+        +
+        Δ
+        t
+        )
+        −
+        X
+        (
+        t
+        )
+      
+    
+    {\displaystyle \Delta X=X(t+\Delta t)-X(t)}
+  
+:
+
+  
+    
+      
+        a
+        (
+        x
+        )
+        =
+        
+          lim
+          
+            Δ
+            t
+            →
+            0
+          
+        
+        
+          
+            
+              E
+              [
+              Δ
+              X
+              (
+              t
+              )
+              ∣
+              X
+              (
+              t
+              )
+              =
+              x
+              ]
+            
+            
+              Δ
+              t
+            
+          
+        
+        ,
+      
+    
+    {\displaystyle a(x)=\lim _{\Delta t\rightarrow 0}{\frac {E[\Delta X(t)\mid X(t)=x]}{\Delta t}},}
+  
+
+  
+    
+      
+        D
+        (
+        x
+        )
+        =
+        
+          lim
+          
+            Δ
+            t
+            →
+            0
+          
+        
+        
+          
+            
+              E
+              [
+              Δ
+              X
+              (
+              t
+              
+                )
+                
+                  T
+                
+              
+              
+              Δ
+              X
+              (
+              t
+              )
+              ∣
+              X
+              (
+              t
+              )
+              =
+              x
+              ]
+            
+            
+              2
+              
+              Δ
+              t
+            
+          
+        
+        .
+      
+    
+    {\displaystyle D(x)=\lim _{\Delta t\rightarrow 0}{\frac {E[\Delta X(t)^{T}\,\Delta X(t)\mid X(t)=x]}{2\,\Delta t}}.}
+  
+
+Here the notation 
+  
+    
+      
+        E
+        [
+        ⋅
+        
+        
+          |
+        
+        
+        X
+        (
+        t
+        )
+        =
+        x
+        ]
+      
+    
+    {\displaystyle E[\cdot \,|\,X(t)=x]}
+  
+ means averaging over all trajectories that are at point x at time t. The coefficients of the Smoluchowski equation can be statistically estimated at each point x from an infinitely large sample of its trajectories in the neighborhood of the point x at time t.
+
+=== Empirical estimation ===
+In practice, the expectations for a and D are estimated by finite sample averages and
+  
+    
+      
+        Δ
+        t
+      
+    
+    {\displaystyle \Delta t}
+  
+ is the time-resolution of the recorded trajectories. Formulas for a and D are approximated  at the time step 
+  
+    
+      
+        Δ
+        t
+      
+    
+    {\displaystyle \Delta t}
+  
+, where for tens to hundreds of points falling in any bin. This is usually enough for the estimation.
+To estimate the local drift and diffusion coefficients, trajectories are first grouped within a small neighbourhood. The field of observation is  partitioned  into square bins 
+  
+    
+      
+        S
+        (
+        
+          x
+          
+            k
+          
+        
+        ,
+        r
+        )
+      
+    
+    {\displaystyle S(x_{k},r)}
+  
+ of side r and centre 
+  
+    
+      
+        
+          x
+          
+            k
+          
+        
+      
+    
+    {\displaystyle x_{k}}
+  
+ and the local drift and diffusion are estimated for each of the square. Considering a sample with 
+  
+    
+      
+        
+          N
+          
+            t
+          
+        
+      
+    
+    {\displaystyle N_{t}}
+  
+   trajectories 
+  
+    
+      
+        {
+        
+          x
+          
+            i
+          
+        
+        (
+        
+          t
+          
+            1
+          
+        
+        )
+        ,
+        …
+        ,
+        
+          x
+          
+            i
+          
+        
+        (
+        
+          t
+          
+            
+              N
+              
+                s
+              
+            
+          
+        
+        )
+        }
+        ,
+      
+    
+    {\displaystyle \{x^{i}(t_{1}),\dots ,x^{i}(t_{N_{s}})\},}
+  
+ where 
+  
+    
+      
+        
+          t
+          
+            j
+          
+        
+      
+    
+    {\displaystyle t_{j}}
+  
+ are the sampling times, the discretization of equation for the drift 
+  
+    
+      
+        a
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        =
+        (
+        
+          a
+          
+            x
+          
+        
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        ,
+        
+          a
+          
+            y
+          
+        
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        )
+      
+    
+    {\displaystyle a(x_{k})=(a_{x}(x_{k}),a_{y}(x_{k}))}
+  
+ at position 
+  
+    
+      
+        
+          x
+          
+            k
+          
+        
+      
+    
+    {\displaystyle x_{k}}
+  
+ is given for each spatial projection on the x and y axis by
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Single-particle_trajectory-2.md b/data/en.wikipedia.org/wiki/Single-particle_trajectory-2.md
new file mode 100644
index 000000000..f9d1e326c
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Single-particle_trajectory-2.md
@@ -0,0 +1,738 @@
+---
+title: "Single-particle trajectory"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Single-particle_trajectory"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:11.886018+00:00"
+instance: "kb-cron"
+---
+
+  
+    
+      
+        
+          a
+          
+            x
+          
+        
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        ≈
+        
+          
+            1
+            
+              N
+              
+                k
+              
+            
+          
+        
+        
+          ∑
+          
+            j
+            =
+            1
+          
+          
+            
+              N
+              
+                t
+              
+            
+          
+        
+        
+          ∑
+          
+            i
+            =
+            0
+            ,
+            
+              
+                
+                  
+                    x
+                    ~
+                  
+                
+              
+              
+                i
+              
+              
+                j
+              
+            
+            ∈
+            S
+            (
+            
+              x
+              
+                k
+              
+            
+            ,
+            r
+            )
+          
+          
+            
+              N
+              
+                s
+              
+            
+            −
+            1
+          
+        
+        
+          (
+          
+            
+              
+                
+                  x
+                  
+                    i
+                    +
+                    1
+                  
+                  
+                    j
+                  
+                
+                −
+                
+                  x
+                  
+                    i
+                  
+                  
+                    j
+                  
+                
+              
+              
+                Δ
+                t
+              
+            
+          
+          )
+        
+      
+    
+    {\displaystyle a_{x}(x_{k})\approx {\frac {1}{N_{k}}}\sum _{j=1}^{N_{t}}\sum _{i=0,{\tilde {x}}_{i}^{j}\in S(x_{k},r)}^{N_{s}-1}\left({\frac {x_{i+1}^{j}-x_{i}^{j}}{\Delta t}}\right)}
+  
+
+  
+    
+      
+        
+          a
+          
+            y
+          
+        
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        ≈
+        
+          
+            1
+            
+              N
+              
+                k
+              
+            
+          
+        
+        
+          ∑
+          
+            j
+            =
+            1
+          
+          
+            
+              N
+              
+                t
+              
+            
+          
+        
+        
+          ∑
+          
+            i
+            =
+            0
+            ,
+            
+              
+                
+                  
+                    x
+                    ~
+                  
+                
+              
+              
+                i
+              
+              
+                j
+              
+            
+            ∈
+            S
+            (
+            
+              x
+              
+                k
+              
+            
+            ,
+            r
+            )
+          
+          
+            
+              N
+              
+                s
+              
+            
+            −
+            1
+          
+        
+        
+          (
+          
+            
+              
+                
+                  y
+                  
+                    i
+                    +
+                    1
+                  
+                  
+                    j
+                  
+                
+                −
+                
+                  y
+                  
+                    i
+                  
+                  
+                    j
+                  
+                
+              
+              
+                Δ
+                t
+              
+            
+          
+          )
+        
+        ,
+      
+    
+    {\displaystyle a_{y}(x_{k})\approx {\frac {1}{N_{k}}}\sum _{j=1}^{N_{t}}\sum _{i=0,{\tilde {x}}_{i}^{j}\in S(x_{k},r)}^{N_{s}-1}\left({\frac {y_{i+1}^{j}-y_{i}^{j}}{\Delta t}}\right),}
+  
+
+where 
+  
+    
+      
+        
+          N
+          
+            k
+          
+        
+      
+    
+    {\displaystyle N_{k}}
+  
+ is the number of points of trajectory that fall in the square 
+  
+    
+      
+        S
+        (
+        
+          x
+          
+            k
+          
+        
+        ,
+        r
+        )
+      
+    
+    {\displaystyle S(x_{k},r)}
+  
+. Similarly, the components of the effective diffusion tensor 
+  
+    
+      
+        D
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+      
+    
+    {\displaystyle D(x_{k})}
+  
+ are approximated by the empirical sums
+
+  
+    
+      
+        
+          D
+          
+            x
+            x
+          
+        
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        ≈
+        
+          
+            1
+            
+              N
+              
+                k
+              
+            
+          
+        
+        
+          ∑
+          
+            j
+            =
+            1
+          
+          
+            
+              N
+              
+                t
+              
+            
+          
+        
+        
+          ∑
+          
+            i
+            =
+            0
+            ,
+            
+              x
+              
+                i
+              
+            
+            ∈
+            S
+            (
+            
+              x
+              
+                k
+              
+            
+            ,
+            r
+            )
+          
+          
+            
+              N
+              
+                s
+              
+            
+            −
+            1
+          
+        
+        
+          
+            
+              (
+              
+                x
+                
+                  i
+                  +
+                  1
+                
+                
+                  j
+                
+              
+              −
+              
+                x
+                
+                  i
+                
+                
+                  j
+                
+              
+              
+                )
+                
+                  2
+                
+              
+            
+            
+              2
+              
+              Δ
+              t
+            
+          
+        
+        ,
+      
+    
+    {\displaystyle D_{xx}(x_{k})\approx {\frac {1}{N_{k}}}\sum _{j=1}^{N_{t}}\sum _{i=0,x_{i}\in S(x_{k},r)}^{N_{s}-1}{\frac {(x_{i+1}^{j}-x_{i}^{j})^{2}}{2\,\Delta t}},}
+  
+
+  
+    
+      
+        
+          D
+          
+            y
+            y
+          
+        
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        ≈
+        
+          
+            1
+            
+              N
+              
+                k
+              
+            
+          
+        
+        
+          ∑
+          
+            j
+            =
+            1
+          
+          
+            
+              N
+              
+                t
+              
+            
+          
+        
+        
+          ∑
+          
+            i
+            =
+            0
+            ,
+            
+              x
+              
+                i
+              
+            
+            ∈
+            S
+            (
+            
+              x
+              
+                k
+              
+            
+            ,
+            r
+            )
+          
+          
+            
+              N
+              
+                s
+              
+            
+            −
+            1
+          
+        
+        
+          
+            
+              (
+              
+                y
+                
+                  i
+                  +
+                  1
+                
+                
+                  j
+                
+              
+              −
+              
+                y
+                
+                  i
+                
+                
+                  j
+                
+              
+              
+                )
+                
+                  2
+                
+              
+            
+            
+              2
+              
+              Δ
+              t
+            
+          
+        
+        ,
+      
+    
+    {\displaystyle D_{yy}(x_{k})\approx {\frac {1}{N_{k}}}\sum _{j=1}^{N_{t}}\sum _{i=0,x_{i}\in S(x_{k},r)}^{N_{s}-1}{\frac {(y_{i+1}^{j}-y_{i}^{j})^{2}}{2\,\Delta t}},}
+  
+
+  
+    
+      
+        
+          D
+          
+            x
+            y
+          
+        
+        (
+        
+          x
+          
+            k
+          
+        
+        )
+        ≈
+        
+          
+            1
+            
+              N
+              
+                k
+              
+            
+          
+        
+        
+          ∑
+          
+            j
+            =
+            1
+          
+          
+            
+              N
+              
+                t
+              
+            
+          
+        
+        
+          ∑
+          
+            i
+            =
+            0
+            ,
+            
+              x
+              
+                i
+              
+            
+            ∈
+            S
+            (
+            
+              x
+              
+                k
+              
+            
+            ,
+            r
+            )
+          
+          
+            
+              N
+              
+                s
+              
+            
+            −
+            1
+          
+        
+        
+          
+            
+              (
+              
+                x
+                
+                  i
+                  +
+                  1
+                
+                
+                  j
+                
+              
+              −
+              
+                x
+                
+                  i
+                
+                
+                  j
+                
+              
+              )
+              (
+              
+                y
+                
+                  i
+                  +
+                  1
+                
+                
+                  j
+                
+              
+              −
+              
+                y
+                
+                  i
+                
+                
+                  j
+                
+              
+              )
+            
+            
+              2
+              
+              Δ
+              t
+            
+          
+        
+        .
+      
+    
+    {\displaystyle D_{xy}(x_{k})\approx {\frac {1}{N_{k}}}\sum _{j=1}^{N_{t}}\sum _{i=0,x_{i}\in S(x_{k},r)}^{N_{s}-1}{\frac {(x_{i+1}^{j}-x_{i}^{j})(y_{i+1}^{j}-y_{i}^{j})}{2\,\Delta t}}.}
+  
+
+The moment estimation requires a large number of trajectories passing through each point, which agrees precisely with  the massive data generated by the a certain types of super-resolution data such as those acquired by sptPALM  technique on biological samples. The exact inversion of Lagenvin's equation demands in theory an infinite number of trajectories passing through any point x of interest. In practice, the recovery of the drift and diffusion tensor is obtained after a region is subdivided by a square grid of radius r or by moving sliding windows (of the order of 50 to 100 nm).
+
+=== Automated recovery of the boundary of a nanodomain ===
+Algorithms based on mapping the density of points extracted from trajectories allow to reveal local binding and trafficking interactions and organization of dynamic subcellular sites.  The algorithms can be applied to study regions of high density, revealved by SPTs. Examples are organelles such as  endoplasmic reticulum or cell membranes. The method is based on spatiotemporal segmentation to  detect local architecture and boundaries of high-density regions for domains measuring hundreds of nanometers. 
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Single-subject_design-0.md b/data/en.wikipedia.org/wiki/Single-subject_design-0.md
index 4c3b83051..04960bf93 100644
--- a/data/en.wikipedia.org/wiki/Single-subject_design-0.md
+++ b/data/en.wikipedia.org/wiki/Single-subject_design-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Single-subject_design"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:52:03.952869+00:00"
+date_saved: "2026-05-05T09:56:53.416326+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Skeletochronology-0.md b/data/en.wikipedia.org/wiki/Skeletochronology-0.md
new file mode 100644
index 000000000..9f15d1907
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Skeletochronology-0.md
@@ -0,0 +1,59 @@
+---
+title: "Skeletochronology"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Skeletochronology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:43.322398+00:00"
+instance: "kb-cron"
+---
+
+Skeletochronology is a technique used to determine the individual, chronological ages of vertebrates by counting lines of arrested, annual growth, also known as LAGs, within skeletal tissues. Within the annual bone growth specimens, there are broad and narrow lines. Broad lines represent the growth period and narrow lines represent a growth pause. These narrow lines are what characterises one growth year, therefore make it suitable to determine the age of the specimen. Not all bones grow at the same rate and the individual growth rate of a bone changes over a lifetime, therefore periodic growth marks can take irregular patterns. This indicates significant chronological events in an individual's life. The use of bone as a biomaterial is useful in investigating structure-property relationships. In addition to current research in skeletochronology, the ability of bone to adapt and change its structure to the external environment provides potential for further research in bone histomorphometry in the future. Amphibians and Reptiles are commonly aged determined, using this method, because they undergo discrete annual activity cycles such as winter dormancy or metamorphosis, however it cannot be used for all species of bony animals. The different environmental and biological factors that influence bone growth and development can become a barrier in determining age as a complete record may be rare.
+
+
+== Method ==
+The extraction and study of bone tissue varies depending on the taxa involved and the amount of material available. However, skeletochronology best focuses on LAGs that encircle the entire shaft in a ring form and have a regular pattern of deposition. These growths show a repeated pattern, 'described mathematically as a time series'. The tissues are divided using a microtome, stained with haematoxylin to be then viewed under a microscope. The analysis is frequently performed on dry bones with the additional application of alcohol or congelated preservation if needed, as the aim is to enhance the optical contrast which results from different physical properties to light.
+It is important to consider potential problems when selecting particular bones to study. If there is a weak optical contrast, it makes counting the arrested growth rings difficult and often inaccurate. There is also a possible presence of additional growth marks that are created to supplement weaker areas of growth. In these circumstances, alternative bones must be considered that may present more accurate data. Another case is the doubling of lines of arrested growth where two closely adjacent twin lines can be seen. However, when the pattern is widespread for several age classes in that species, then the twin LAGs can be counted as a single year growth. The most common issue to arise is the destruction of bone from biological processes, most frequently discovered in mammals and Birds. This causes age to be significantly underestimated. Over the lifespan of an individual, bone is constantly being reconstructed as specialised cells remove and deposit bone leading to a constant renewal of the bone material. The continuous resorption and deposition leaves gaps in the record of growth and missing bone tissue is a case at any stage of a vertebrate's life cycle; 'complete specimens that allow precise identification are extremely rare'.
+Therefore, to account for any missing bone tissues in a specimen, retrocalculation of skeletal age is to be completed.
+Three approaches can be identified in retro calculating.
+1)    Retro calculating of skeletal age which involves identifying major and minor axe of the bone's cross section and circumferences of bones calculated using Ramanujan's formula
+
+  
+    
+      
+        C
+        =
+        π
+        [
+        3
+        (
+        a
+        +
+        b
+        )
+        −
+        √
+        (
+        a
+        +
+        3
+        b
+        )
+        (
+        3
+        a
+        +
+        b
+        )
+        ]
+      
+    
+    {\displaystyle C=\pi [3(a+b)-\surd (a+3b)(3a+b)]}
+  
+.
+2)    Retro calculating through arithmetic estimate which requires the sampling of several parts of other bone and making an estimate of the number of missing tissues
+3)    Retro calculating by superimposition in an Ontogenic series which requires a complete growth record on one individual so that their histological cross sections can be overlaid and reconstructed on another individual.
+Russian zoologist Galina Klevezal was a pioneering researcher in the field of skeletochronology, particularly in marine mammals.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Smoking_gun-0.md b/data/en.wikipedia.org/wiki/Smoking_gun-0.md
new file mode 100644
index 000000000..a7737ec60
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Smoking_gun-0.md
@@ -0,0 +1,29 @@
+---
+title: "Smoking gun"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Smoking_gun"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:22.056126+00:00"
+instance: "kb-cron"
+---
+
+The term "smoking gun" is a reference to an object or fact that serves as conclusive evidence of a crime or similar act, just short of being caught in flagrante delicto. "Smoking gun" refers to the strongest kind of circumstantial evidence, as opposed to direct evidence. Direct evidence would be eyewitness testimony of someone who saw an actus reus (the actual alleged act), while connected events (the preceding chase, etc.) are considered circumstantial. 
+
+
+== Phrase origin ==
+The phrase originally came from the idea that finding a very recently fired (hence smoking) gun on the person of a suspect wanted for shooting someone would in that situation be nearly unshakable proof of having committed the crime. A variant of the phrase (as "smoking pistol") is used in the Sherlock Holmes story, "The Adventure of the Gloria Scott" (1893).
+
+
+== Extended meaning ==
+In addition to this, its meaning has evolved in uses completely unrelated to criminal activity: for example, scientific evidence that is highly suggestive in favor of a particular hypothesis is sometimes called "smoking gun evidence".
+
+
+== See also ==
+
+Burden of proof (law)
+Incontrovertible evidence
+Nixon White House tapes § "Smoking Gun" tape
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Social_media_mining-0.md b/data/en.wikipedia.org/wiki/Social_media_mining-0.md
new file mode 100644
index 000000000..1528527cc
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Social_media_mining-0.md
@@ -0,0 +1,24 @@
+---
+title: "Social media mining"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Social_media_mining"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:13.117723+00:00"
+instance: "kb-cron"
+---
+
+Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services.
+Social media mining uses concepts from computer science, data mining, machine learning, and statistics. Mining is based on social network analysis, network science, sociology, ethnography, optimization and mathematics. It attempts to formally represent, measure and model patterns from social media data. In the 2010s, major corporations, governments and not-for-profit organizations began mining to learn about customers, clients and others.
+Platforms such as Google, Facebook (partnered with Datalogix and BlueKai) conduct mining to target users with advertising. Scientists and machine learning researchers extract insights and design product features.
+Users may not understand how platforms use their data. Users tend to click through Terms of Use agreements without reading them, leading to ethical questions about whether platforms adequately protect users' privacy.
+During the 2016 United States presidential election, Facebook allowed Cambridge Analytica, a political consulting firm linked to the Trump campaign, to analyze the data of an estimated 87 million Facebook users to profile voters, creating controversy when this was revealed.
+
+== Background ==
+As defined by Kaplan and Haenlein, social media is the "group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." There are many categories of social media including, but not limited to, social networking (Facebook or LinkedIn), microblogging (Twitter), photo sharing (Flickr, Instagram, Photobucket, or Picasa), news aggregation (Google Reader, StumbleUpon, or Feedburner), video sharing (YouTube, MetaCafe), livecasting (Ustream or Twitch), virtual worlds (Kaneva), social gaming (World of Warcraft), social search (Google, Bing, or Ask.com), and instant messaging (Google Talk, Skype, or Yahoo! messenger).
+The first social media website was introduced by GeoCities in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of HTML coding. The first social networking site, SixDegrees.com, was introduced in 1997. Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma.
+Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media.
+
+== Uses ==
+Social media mining is used across several industries including business development, social science research, health services, and educational purposes. Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity. These forces are then measured via statistical analysis of the nodes and connections between these nodes. Social analytics also uses sentiment analysis, because social media users often relay positive or negative sentiment in their posts. This provides important social information about users' emotions on specific topics.
+These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network. Companies would be interested in this information in order to decide who they may hire for influencer marketing. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites. Analysts also value measures of homophily: the tendency of two similar individuals to become friends. Users have begun to rely on information of other users' opinions in order to understand diverse subject matter. These analyses can also help create recommendations for individuals in a tailored capacity. By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Social_media_mining-1.md b/data/en.wikipedia.org/wiki/Social_media_mining-1.md
new file mode 100644
index 000000000..4b1c1b63f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Social_media_mining-1.md
@@ -0,0 +1,47 @@
+---
+title: "Social media mining"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Social_media_mining"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:13.117723+00:00"
+instance: "kb-cron"
+---
+
+== Perception ==
+Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and Google. Companies such as these, considered "Big Tech" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021.
+It has been claimed by a multitude of anti-algorithm personalities, like Tristan Harris or Chamath Palihapitiya, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics.
+At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming influencers. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought.
+
+== Research ==
+
+=== Research areas ===
+Social media event detection – Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events.
+Public health monitoring and surveillance - Using large-scale analysis of social media to study large cohorts of patients and the general public, e.g. to obtain early warning signals of drug-drug interactions and adverse drug reactions, or understand human reproduction and sexual interest.
+Community structure (Community Detection/Evolution/Evaluation) – Identifying communities on social networks, how they evolve, and evaluating identified communities, often without ground truth.
+Network measures – Measuring centrality, transitivity, reciprocity, balance, status, and similarity in social media.
+Network models – Simulate networks with specific characteristics. Examples include random graphs (E-R models), Preferential attachment models, and small-world models.
+Information cascade – Analyzing how information propagates in social media sites. Examples include herd behavior, information cascades, diffusion of innovations, and epidemic models.
+Influence and homophily – Measuring network assortativity and measuring and modeling influence and homophily.
+Recommendation in social media – recommending friends or items on social media sites.
+Social search – Searching for information on the social web.
+Sentiment analysis in social media – Identifying collectively subjective information, e.g. positive and negative, from social media data.
+Social spammer detection – Detecting social spammers who send out unwanted spam content appearing on social networks and any website with user-generated content to targeted users, often corroborating to boost their social influence, legitimacy, credibility.
+Feature selection with social media data – Transforming feature selection to harness the power of social media.
+Trust in social media – Studying and understanding of trust in social media.
+Distrust and negative links – Exploring negative links in social media.
+Role of social media in crises – Social media is continuing to play an important role during crises, particularly Twitter. Studies show that it is possible to detect earthquakes and rumors using tweets published during crisis. Developing tools to help first responders to analyze tweets towards better crisis response and developing techniques to provide them faster access to relevant tweets is an active area of research.
+Location-based social network mining – Mining Human Mobility for Personalized POI Recommendation on Location-based Social Networks.
+Provenance of information in social media – Provenance informs a user about the sources of a given piece of information. Social media can help in identifying the provenance of information due its unique features: user-generated content, user profiles, user interactions, and spatial or temporal information.
+Vulnerability management – A user's vulnerability on a social networking sites can be managed in three sequential steps: (1) identifying new ways in which a user can be vulnerable, (2) quantifying or measuring a user's vulnerability, and (3) reducing or mitigating them.
+Opinion mining on candidates/parties - Social media is a popular medium for candidates/parties to campaign and for gauging the public reaction to the campaigns. Social media can also be used as an indicator of the voters' opinion. Some research studies have shown that predictions made using social media posts can match (or even improve) traditional opinion polls.
+
+=== Publication venues ===
+Social media mining research articles are published in computer science, social science, and data mining conferences and journals:
+
+==== Conferences ====
+Conference papers can be found in proceedings of Knowledge
+Discovery and Data Mining (KDD), World Wide Web (WWW), Association
+for Computational Linguistics (ACL), Conference on Information
+and Knowledge Management (CIKM), International Conference on Data
+Mining (ICDM), Internet Measuring Conference (IMC).
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Social_media_mining-2.md b/data/en.wikipedia.org/wiki/Social_media_mining-2.md
new file mode 100644
index 000000000..b28891c48
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Social_media_mining-2.md
@@ -0,0 +1,69 @@
+---
+title: "Social media mining"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Social_media_mining"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:13.117723+00:00"
+instance: "kb-cron"
+---
+
+KDD Conference – ACM SIGKDD Conference on Knowledge Discovery and Data Mining
+WWW Conference –  International World Wide Web Conference
+WSDM Conference – ACM Conference on Web Search and Data Mining
+CIKM Conference – ACM Conference on Information and Knowledge Management
+ICDM Conference – IEEE International Conference on Data Mining
+Association for Computational Linguistics (ACL)
+ASONAM conference - IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
+Internet Measuring Conference (IMC)
+International Conference on Web and Social Media (ICWSM)
+International Conference on Social Media & Society
+International Conference on Web Engineering (ICWE)
+The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases(ECML/PKDD),
+International Joint Conferences on Artificial Intelligence (IJCAI),
+Association for the Advancement of Artificial Intelligence (AAAI),
+Recommender Systems (RecSys)
+Computer-Human Interaction (CHI)
+Social Computing Behavioral-Cultural Modeling and Prediction (SBP).
+HT Conference – ACM Conference on Hypertext
+SDM Conference – SIAM International Conference on Data Mining (SIAM)
+PAKDD Conference – The annual Pacific-Asia Conference on Knowledge Discovery and Data Mining
+MISNC - Multidisciplinary International Social Networks Conference
+
+==== Journals ====
+DMKD Conference – Research Issues on Data Mining and Knowledge Discovery
+ECML-PKDD Conference – European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
+ACM Transactions on Intelligent Systems and Technology (TIST)
+ACM Transactions on Knowledge Discovery from Data (TKDD)
+ACM Transactions on the Web (TWEB)
+IEEE Transactions on Knowledge and Data Engineering (TKDE),
+IEEE Intelligent Systems
+Internet Mathematics
+International Journal of Social Network Mining (IJSNM)
+Knowledge and Information Systems (KAIS)
+World Wide Web Journal
+Social Network Analysis and Mining (SNAM)
+Social Networks
+SIGKDD Exploration
+Social media mining is also present on many data management/database conferences such as the ICDE Conference, SIGMOD Conference and International Conference on Very Large Data Bases.
+
+== See also ==
+Methods
+Social media measurement
+Text mining
+Application domains
+Web mining
+Twitter mining
+Companies
+NUVI
+Related topics
+Social media
+Profiling (information science)
+Web scraping
+GDPR
+
+== References ==
+
+== External links ==
+Zafarani, Reza; Abbasi, Mohammad Ali; and Liu, Huan (2014); Social Media Mining: An Introduction, Cambridge University Press
+Barbier, Geoffrey; Feng, Zhuo; Gundecha, Pritam; Liu, Huan (2013). "Provenance Data in Social Media". Synthesis Lectures on Data Mining and Knowledge Discovery. 4: 1–84. doi:10.2200/S00496ED1V01Y201304DMK007. S2CID 46794494.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subdivision_bifiltration-0.md b/data/en.wikipedia.org/wiki/Subdivision_bifiltration-0.md
new file mode 100644
index 000000000..53e6b13d8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subdivision_bifiltration-0.md
@@ -0,0 +1,506 @@
+---
+title: "Subdivision bifiltration"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Subdivision_bifiltration"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:14.286535+00:00"
+instance: "kb-cron"
+---
+
+In topological data analysis, a subdivision bifiltration is a collection of filtered simplicial complexes, typically built upon a set of data points in a metric space, that captures shape and density information about the underlying data set. The subdivision bifiltration relies on a natural filtration of the barycentric subdivision of a simplicial complex by flags of minimum dimension, which encodes density information about the metric space upon which the complex is built. The subdivision bifiltration was first introduced by Donald Sheehy in 2011 as part of his doctoral thesis (later subsumed by a conference paper in 2012) as a discrete model of the multicover bifiltration, a continuous construction whose underlying framework dates back to the 1970s. In particular, Sheehy applied the construction to both the Vietoris-Rips and Čech filtrations, two common objects in the field of topological data analysis. Whereas single parameter filtrations are not robust with respect to outliers in the data, the subdivision-Rips and -Cech bifiltrations satisfy several desirable stability properties.
+
+
+== Definition ==
+Let 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ be a simplicial complex. Then a nested sequence of simplices 
+  
+    
+      
+        
+          σ
+          
+            1
+          
+        
+        ⊂
+        
+          σ
+          
+            2
+          
+        
+        ⊂
+        ⋯
+        ⊂
+        
+          σ
+          
+            k
+          
+        
+      
+    
+    {\displaystyle \sigma _{1}\subset \sigma _{2}\subset \cdots \subset \sigma _{k}}
+  
+ of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ is called a flag or chain of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+. The set of all flags of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ comprises an abstract simplicial complex, known as the barycentric subdivision of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+, denoted by 
+  
+    
+      
+        Bary
+        ⁡
+        (
+        T
+        )
+      
+    
+    {\displaystyle \operatorname {Bary} (T)}
+  
+. The barycentric subdivision is naturally identified with a geometric subdivision of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+, created by starring the geometric realization of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ at the barycenter of each simplex.
+There is a natural filtration on 
+  
+    
+      
+        Bary
+        ⁡
+        (
+        T
+        )
+      
+    
+    {\displaystyle \operatorname {Bary} (T)}
+  
+ by considering for each natural number 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+ the maximal subcomplex of 
+  
+    
+      
+        Bary
+        ⁡
+        (
+        T
+        )
+      
+    
+    {\displaystyle \operatorname {Bary} (T)}
+  
+ spanned by vertices of 
+  
+    
+      
+        Bary
+        ⁡
+        (
+        T
+        )
+      
+    
+    {\displaystyle \operatorname {Bary} (T)}
+  
+ corresponding to simplices of 
+  
+    
+      
+        T
+      
+    
+    {\displaystyle T}
+  
+ of dimension at least 
+  
+    
+      
+        k
+        −
+        1
+      
+    
+    {\displaystyle k-1}
+  
+, which is denoted 
+  
+    
+      
+        
+          
+            
+              
+                S
+              
+              ~
+            
+          
+        
+        (
+        T
+        
+          )
+          
+            k
+          
+        
+      
+    
+    {\displaystyle {\tilde {\mathcal {S}}}(T)_{k}}
+  
+. In particular, by this convention, then 
+  
+    
+      
+        
+          
+            
+              
+                S
+              
+              ~
+            
+          
+        
+        (
+        T
+        
+          )
+          
+            1
+          
+        
+        =
+        Bary
+        ⁡
+        (
+        T
+        )
+      
+    
+    {\displaystyle {\tilde {\mathcal {S}}}(T)_{1}=\operatorname {Bary} (T)}
+  
+. Considering the sequence of nested subcomplexes given by varying the parameter 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+, we obtain a filtration on 
+  
+    
+      
+        Bary
+        ⁡
+        (
+        T
+        )
+      
+    
+    {\displaystyle \operatorname {Bary} (T)}
+  
+ known as the subdivision filtration. Since the complexes in the subdivision filtration shrink as 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+ increases, we can regard it as a functor 
+  
+    
+      
+        
+          
+            
+              
+                S
+              
+              ~
+            
+          
+        
+        (
+        −
+        )
+        :
+        
+          
+            N
+          
+          
+            op
+          
+        
+        →
+        
+          S
+          i
+          m
+          p
+        
+      
+    
+    {\displaystyle {\tilde {\mathcal {S}}}(-):\mathbb {N} ^{\operatorname {op} }\to \mathbf {Simp} }
+  
+ from the opposite posetal category 
+  
+    
+      
+        
+          
+            N
+          
+          
+            op
+          
+        
+      
+    
+    {\displaystyle \mathbb {N} ^{\operatorname {op} }}
+  
+ to the category 
+  
+    
+      
+        
+          S
+          i
+          m
+          p
+        
+      
+    
+    {\displaystyle \mathbf {Simp} }
+  
+ of simplicial complexes and simplicial maps.
+Let 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+ be a partially ordered set. Given a simplicial filtration 
+  
+    
+      
+        F
+        :
+        P
+        →
+        
+          S
+          i
+          m
+          p
+        
+      
+    
+    {\displaystyle F:P\to \mathbf {Simp} }
+  
+, regarded as a functor from the posetal category of 
+  
+    
+      
+        P
+      
+    
+    {\displaystyle P}
+  
+ to the category 
+  
+    
+      
+        
+          S
+          i
+          m
+          p
+        
+      
+    
+    {\displaystyle \mathbf {Simp} }
+  
+, by applying the subdivision filtration object-wise on 
+  
+    
+      
+        F
+      
+    
+    {\displaystyle F}
+  
+, we obtain a two-parameter filtration 
+  
+    
+      
+        
+          
+            S
+          
+        
+        (
+        F
+        )
+        :
+        
+          
+            N
+          
+          
+            op
+          
+        
+        ×
+        P
+        →
+        
+          S
+          i
+          m
+          p
+        
+      
+    
+    {\displaystyle {\mathcal {S}}(F):\mathbb {N} ^{\operatorname {op} }\times P\to \mathbf {Simp} }
+  
+, called the subdivision bifiltration.
+In particular, when we take 
+  
+    
+      
+        F
+      
+    
+    {\displaystyle F}
+  
+ to be the Rips or Čech filtration, we obtain bifiltrations 
+  
+    
+      
+        
+          
+            S
+          
+        
+        Rips
+        ⁡
+        (
+        −
+        )
+      
+    
+    {\displaystyle {\mathcal {S}}\operatorname {Rips} (-)}
+  
+ and 
+  
+    
+      
+        
+          
+            S
+          
+        
+        
+          
+            
+              
+                C
+                ˇ
+              
+            
+          
+          e
+          c
+          h
+        
+        ⁡
+        (
+        −
+        )
+      
+    
+    {\displaystyle {\mathcal {S}}\operatorname {{\check {C}}ech} (-)}
+  
+, respectively.
+
+
+== Properties ==
+The subdivision-Čech bifiltration is weakly equivalent to the multicover bifiltration, implying that they have isomorphic persistent homology. A combinatorial proof of this statement was given in Sheehy's original conference paper, but a more algebraic version was presented in 2017 by Cavanna et al. The ideas from Cavanna's proof were later generalized by Blumberg and Lesnick in a 2022 paper on 2-parameter persistent homology.
+By the size of a bifiltration, we mean the number of simplices in the largest complex. The subdivision-Čech bifiltration has exponential size as a function of the number of vertices. This implies that its homology cannot be directly computed in polynomial time. However, for points in Euclidean space, the homology of subdivision-Čech can be computed in polynomial time, up to weak equivalence, via a construction known as the rhomboid bifiltration. As a precursor to the rhomboid bifiltration, Edelsbrunner and Osang presented in 2021 a polyhedral cell complex called the rhomboid tiling, which they used to compute horizontal or vertices slices of the multicover bifiltration up to weak equivalence. This was extended a year later by Corbet et al. to the rhomboid bifiltration, which is weakly equivalent to the multicover bifiltration, but has polynomial size.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-0.md b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-0.md
new file mode 100644
index 000000000..0dc7be3dd
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-0.md
@@ -0,0 +1,31 @@
+---
+title: "Subpoena duces tecum"
+chunk: 1/6
+source: "https://en.wikipedia.org/wiki/Subpoena_duces_tecum"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:23.289503+00:00"
+instance: "kb-cron"
+---
+
+A subpoena duces tecum (pronounced in English  sə-PEE-nə DEW-seez TEE-kəm), or subpoena for production of evidence, is a court summons ordering the recipient to appear before the court and produce documents or other tangible evidence for use at a hearing or trial. In some jurisdictions, it can also be issued by legislative bodies such as county boards of supervisors.
+The summons is known by various names in different jurisdictions. The term subpoena duces tecum is used in the United States, and some other common law jurisdictions such as South Africa and Canada. The summons is called a "subpoena for production of evidence" in some U.S. states that have sought to reduce the use of non-English words and phrases in court terminology.
+The subpoena duces tecum is similar to the subpoena ad testificandum, which is a writ summoning a witness to testify orally. However, unlike the latter summons, the subpoena duces tecum instructs the witness to bring in hand books, papers, or evidence for the court. In most jurisdictions, a subpoena usually has to be served personally.
+
+== Etymology ==
+The phrase sub poena duces tecum is a Latin expression meaning literally "under [threat of] penalty [or punishment], you will bring [it] with you." The word sub means "under" and poena "penalty"; duces "you will lead, guide, pull, bring"; and tecum "with you".
+
+== Order pursuant to a deposition ==
+In the United States, a notice to a party (to the action) deponent (a person called to testify in a deposition) may be accompanied by a request for production of documents and other tangible things during the taking of a deposition. The notice to produce (literally: "bring these documents with you to the deposition") is served prior to the deposition. This follows the Federal Rules of Civil Procedure. The method of using a subpoena duces tecum is generally valid only to compel a witness to produce documents and other things at the time of the deposition. 
+If a deponent is a non-party to the action (not involved directly in the litigation, but wanted for testimony), production of documents can be compelled only through a proper subpoena duces tecum.
+Federal cases and some states follow Rule 27(a)(3) of the Federal Rules of Civil Procedure concerning the production of documents in pretrial discovery, including those pertaining to depositions. These can include the subpoena duces tecum to produce documents, or in some cases to undergo a physical or mental examination. In the Ninth Circuit, interpreting Rule 27 literally, it has been held that a party can simply produce the documents only, and in certain cases, avoid an oral deposition when presented with a subpoena duces tecum.
+
+=== Jencks Act criminal cases ===
+In the 1957 case Jencks v. United States the United States Supreme Court ruled that a defendant must have access to government witnesses who will testify against him in a criminal trial, and must also have access to any documents pertaining to that testimony. This includes papers, documents, written statements, and the like. This led to passage of the Jencks Act, 18 USC, Part II, Chapter 223, § 3500, which allows for subpoena duces tecum of relevant government documents, but only after a government agent or employee has testified at trial. There can be no pre-trial discovery. The subpoena is allowed by the trial judge. The government has the right to deny access to the documents. This may be due to the sensitive nature of the documents, or because they are classified.
+If a remedy is granted, there is a mistrial and dismissal of criminal charges. An accused criminal has no right to subpoena the work product of the prosecution in a criminal case.
+
+== US state court seeking to compel production of documents from a witness in another state ==
+The subpoena power of any  state court in the United States generally ends at that state’s border. Consequently, lacking any powers outside the state's border, state prosecutors and defense attorneys in a state criminal case  cannot use the same procedures that they would use to obtain a subpoena for an out-of-state witness that they would for an in-state witness.
+
+== Compelling a foreign corporation to produce documents ==
+A domestic corporation may be considered to be a "person" within the meaning of the Fourteenth Amendment of the United States Constitution.  It is not necessary to treat a corporation as a person in all circumstances.  United States case law is confusing concerning this matter when dealing with foreign corporations, and their operation within the United States.  Especially troubling have been rulings concerning the Fourth Amendment of the United States Constitution and Fifth Amendment to the United States Constitution.  A foreign agent may not claim Fifth Amendment provisions against self-incrimination. Nor can records be withheld from subpoena duces tecum on the grounds that production of such documents would incriminate officers or other members of the foreign corporation.  However, there is case authority in which foreign corporations have been protected from illegal searches and seizures, including documents and books.    The matter of a foreign corporation operating as a "person" within the United States being afforded protection under the Fourteenth Amendment is discussed.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-1.md b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-1.md
new file mode 100644
index 000000000..ade58b947
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-1.md
@@ -0,0 +1,32 @@
+---
+title: "Subpoena duces tecum"
+chunk: 2/6
+source: "https://en.wikipedia.org/wiki/Subpoena_duces_tecum"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:23.289503+00:00"
+instance: "kb-cron"
+---
+
+== Failure to produce documents ==
+In the United States, a continuance (a rescheduling of a court hearing to a later date) of a civil action may be granted due to the absence of documents or papers. The party failing to produce the documents requested by a subpoena duces tecum must show good reason why there was a failure to do so. Acceptable explanations have included loss or destruction of papers, or an agreement to use copies. The party seeking the continuance must show that the absence of the documents is not because of the negligence of their own, or of the attorney of record.
+Similarly, a continuance may be granted in a criminal case if there is good reason documents pertinent to the case could not be produced at the time of trial. For example, a continuance should be granted for failure to produce a transcript of testimony given at a previous trial. In general, it is reversible error to proceed with a criminal trial in the absence of a previous trial transcript, when such contains pertinent information that should have been considered in the new trial. In these cases, a continuance is the usual remedy.
+
+=== Commitment of witness; contempt of court ===
+A witness who has refused to obey a lawful order to produce books, documents and papers may be incarcerated for contempt of court. A writ of habeas corpus will not apply unless it can be shown the witness could not have legally had possession of such documents. In such a situation the writ of habeas corpus will properly apply, and is the remedy for such improper action.
+At common law, and under various statutes pertaining to a given jurisdiction, a right to action for damages, or for a statutory penalty or forfeiture, exists against a witness who, without sufficient excuse, fails or refuses to give oral testimony or to produce documents or other specified items in obedience to the command of a properly issued and served subpoena.
+There are certain conditions precedent, or defenses, to a recovery of damages for a person's failure to testify, or to provide documents pertinent to a hearing or trial. There must be a breach of testimonial duty, after having been properly served with a legitimately executed subpoena. There must be a demonstration of actual damages incurred from the absence of testimony. Most courts have rejected the arguments for seeking damages in this kind of case. Giving false testimony in a judicial proceeding even though the allegation is made that the person giving the testimony knew it to be false does not give rise, either at common law or by statute, to a civil action for damages resulting from such testimony. The situation is probably different if intentionally false documents are submitted under a subpoena duces tecum.
+
+== Writ of mandamus vacating an order to produce documents ==
+A writ of mandamus (Latin for "we command") is appropriate to compel surrender of documents in the possession of attorneys or other persons that have been illegally obtained under the abuse of a writ of attachment.  Mandamus can vacate an order to produce books and papers.
+
+In an 1893 case, the United States Attorney for Alabama refused to vacate his office, refusing to surrender books, papers, and other materials to the newly appointed US Attorney. The federal court in Alabama issued a writ directing the previous attorney to relinquish the documents. He, in turn, sought relief from the Supreme Court, which denied his application, saying it would not interfere with the properly conducted internal matters of a court. In the case In re: Parsons, the US Supreme Court wrote: "If the orders be regarded merely as directions in the administration of judicial affairs in respect of the immediate possession of property or custody of prisoners, we cannot be properly called to, by reason of anything appearing on these records, in the exercise of appellate jurisdiction in this manner, to direct them to be set aside. And if the proceedings should be treated as involving a final determination as on issues joined to the right to such possession and custody, there was no complaint of want of notice or of hearing, and the summary made adopted did not in itself affect the jurisdiction of the Circuit Court upon the ground that it had exceeded its powers."
+Mandamus is the remedy where a lower court has clearly failed to issue compulsion to produce documents, or to allow the petitioner access to such documents as may be in the possession of the court or the parties to the action. Mandamus can be used to compel a court to enforce an order to answer interrogatories (questions submitted by the court or one of the parties to be answered under oath and pain of perjury).
+Mandamus is the proper remedy to compel the quashing of a subpoena duces tecum for the production before a grand jury of documents protected by attorney–client privilege.  Presumably, this would apply to attorney work product, although there is no case law on the matter.
+
+== Public access to documents filed with the court ==
+The right of the public to access judicial records is fundamental to a democratic state and is analogous to the United States' First Amendment right of freedom of speech and of the press and the Sixth Amendment right to public trials.  While the right to access trial records is not absolute, it is framed in presumption of public access to the proceedings and records.    United States Code 11, Section 107 (a), of the federal bankruptcy law, is a codification of the common-law general right to inspect judicial records and documents.  However, the right is not absolute and may be denied when the entity seeking to view the records has an improper purpose.  The general intent of the statute is to favor public access to court documents.
+
+== Specific types of documents ==
+
+=== Privileged documents ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-2.md b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-2.md
new file mode 100644
index 000000000..3cf4f1d6e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-2.md
@@ -0,0 +1,23 @@
+---
+title: "Subpoena duces tecum"
+chunk: 3/6
+source: "https://en.wikipedia.org/wiki/Subpoena_duces_tecum"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:23.289503+00:00"
+instance: "kb-cron"
+---
+
+Attorney–client privilege is generally recognized by the courts. Communications between lawyer and client are generally immune from subpoena. In other words, a lawyer cannot be compelled to testify in a trial unless the lawyer becomes, or appears to become, a party to the litigation. A similar situation exists with "work product", meaning written documents or computer records generated in preparation for a trial or hearing. This includes information such as potential questions that may be asked of witnesses, lists of possible witnesses, memoranda, notes, trial strategies, written briefs, or documents that may, or may not end up being used in the course of litigation. Usually, none of this can be the subject of a subpoena duces tecum. If a communication between lawyer and client is made in the presence of a  third party, the privilege is not recognized to exist.
+The federal courts will apply the common law rule of attorney–client privilege unless there is an intervening state law applying to the central issues of the matter. In those cases, the federal court uses the effective state law.
+Physician–patient privilege is usually statutorily defined, and can vary from state to state.  The usual rule is that medical records are immune from subpoena if the plaintiff has not alleged physical or mental injuries or damages.  Once the plaintiff alleges physical or mental injuries proximately flowing from a potentially tortious act by the defendant, or in some other disability hearing, medical records can be subject to subpoena duces tecum.  While witnesses may try to resist legal discovery  by asking the judge to protect them from questioning or inspection of documents, the policy of the courts is in favor of full disclosure. It is the intent of the rules of procedure that pre-trial discovery take place without any intervention of a judge. So-called "fishing expeditions" (massive and aimless calls for all documents related to the litigation) are permissible under Federal Rule of Civil Procedure 26 (b) (1). This rule is repeated in many states' rules of procedure: "Parties may obtain discovery regarding any matter, not privileged, which is relevant ... if the information sought appears reasonably calculated to lead to the discovery of admissible evidence."  The looseness of the definition of relevant evidence is generally construed to mean "liberal" production. The physician who is the party to an action does not own the records of patients he has treated. They are not privileged if the patient has waived confidentiality. Physicians must produce medical records under subpoena duces tecum.
+Peer review records and other hospital documents of quality control committee meetings are generally not subject to subpoena duces tecum, since these have statutory immunity. The theory is that the frankness of peer review would be chilled if these records could be routinely compelled.
+Several United States Federal Circuit Courts have recognized a limited reporter's privilege.
+In some states (such as California), rape crisis counselors and domestic violence advocates hold a statutory privilege analogous to therapist–client privilege. (See, for example, 1035 Cal. Evidence Code for rape crisis advocates, and 1037.6 Cal. Evidence Code for domestic violence advocates).
+
+=== Welfare documents ===
+Statutes governing the disclosure of information contained in welfare records exist in many jurisdictions.  The rationale for the existence of these regulations is to encourage full and frank disclosure by the welfare recipient of his situation and the protection of the recipient from the embarrassment likely to result from the disclosure of information contained in such records.   In some states, records can be disclosed at the discretion of the state director of welfare.  In general, welfare records are not public records, and should not be considered to be such.  Disclosure of information is usually limited to purposes directly connected with the administration of welfare benefits. The investigation of costs of welfare programs have been held to be sufficiently related to the matters in question to justify disclosure.  Statutes designed to limit welfare record availability are generally held by the courts to be not immune from the power of subpoena duces tecum.
+Certain state laws limit the availability of information that can be obtained from the subpoena of such documents.  These are always subject to a court challenge, on a case-by-case basis.  Welfare recipients are generally allowed access to their files, by subpoena duces tecum.  Death of a welfare recipient is considered in some states to be sufficient reason to remove the reason for confidentiality. Some states have passed so-called "Right to Know" statutes, which would make welfare recipients and the information available to the public.  These, along with common law, and state and federal constitutions guaranteeing freedom of the press do not give newspapers (or other news media) the right to access the names of persons on welfare, or the amounts they receive.
+
+=== Documents  in bankruptcy proceeding ===
+An entity (person or a corporation) may be compelled to produce documentary evidence in accordance with the subpoena powers of Federal Rule of Civil Procedure 45 as applied by Bankruptcy Rule 9016. The United States Bankruptcy Court has powers to compel production of documents from a non-debtor corporation or person concerning transactions involving the debtor corporation or person. Production of documents can be challenged as being burdensome. Assets diverted to outside corporations or bank accounts/stock portfolios and such other assets as land holdings lie within the power to compel production under subpoena duces tecum. Federal law recognizes no accountant-client privilege. A subpoena duce tecum served pursuant to Bankruptcy Rule 2004 is not a violation of accountant-client privilege. 11 United States Code section 107 (a) provides that papers filed in cases under the Bankruptcy Code and dockets of the Bankruptcy Courts are public records and are to be open to examination at reasonable times without charge.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-3.md b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-3.md
new file mode 100644
index 000000000..eb57d0028
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-3.md
@@ -0,0 +1,40 @@
+---
+title: "Subpoena duces tecum"
+chunk: 4/6
+source: "https://en.wikipedia.org/wiki/Subpoena_duces_tecum"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:23.289503+00:00"
+instance: "kb-cron"
+---
+
+=== Documents relating to Federal Trade Commission hearings in monopoly actions ===
+Whenever the Federal Trade Commission (FTC) has reason to believe that any person has violated 15 USC section 13, 14, 18 or 19, it must issue and serve on that person and on the Attorney General of the United States, a complaint stating its charges in that regard.  The notice shall also give a date for a hearing in the matter.  Delivery of the subpoena duces tecum for production of documents may be done in person, or by certified letter.  Receipt of the letter is considered proof of service.
+Power to issue subpoenas is extended to Robinson–Patman Act cases of price-fixing and Clayton Act cases of unlawful acquisition.
+A Federal District Court lacks jurisdiction to enjoin the Federal Trade Commission from proceeding in an investigation.  It cannot stay (stop) a subpoena duces tecum to produce documents in the investigative stage.  An injunction by a federal court does not have the power to restrain the FTC from enforcing an order requiring corporations to furnish reports and documents un 15 USC § 49.  The only relief available to stop a demand for documents is to seek an action of compliance in mandamus by the Attorney General of the United States, or under 15 USC § 50 to enforce fines and forfeitures.
+If the FTC institutes an adjudicative proceeding (a hearing), the person who originated the matter by complaining to the FTC is not a party to the action and does not have any control over it.  The FTC may allow the complaining person to participate in the proceeding by virtue of 15 USC, section 45.  This allows participation for good cause, either by counsel (lawyer) or in person.  You cannot intervene in an FTC hearing, except by demonstrating that substantial issues of law or fact would not be properly raised and argued—and that these issues are important and immediate enough to warrant additional expenditure of FTC resources.  This involvement can be enhanced by subpoena duces tecum.
+Pre-hearing conferences are the norm.  These are useful in:
+
+Clarifying or simplifying issues
+Amending pleadings
+Entering stipulations, admissions of fact, and contents and authenticity of documents
+Expediting discovery and presentation of evidence, including restriction of witnesses
+Matters subject to official notice that may be resolved by further production of documents related to the case
+In general, pre-hearing conferences are not public.   The FTC is not restricted by a rigid rule of evidence.
+
+=== Documents in  execution proceedings ===
+Discovery can be authorized for the production of documents for both pre-trial and post-trial actions. Most states either follow, or have modeled their procedures after, the Federal Rules of Civil Procedure Rule 69(a).
+Judgment creditors (those who have received a favorable court ruling for monetary damages) are permitted to ask questions about a debtor's residence; recent employment history; business relationships, including partners, co-shareholders, co-officers, co-directors; the contents of a will; transfers of property; and the identity of persons who either owed a debt to the judgment debtor, or received things of value from the debtor. Information in bank accounts can also be the subject of a subpoena duces tecum.
+In federal court proceedings concerning judgment debtors, the inquiry is usually limited to the discovery of assets. In international cases, being tried in United States Federal Courts, the application of the Hague Service Convention is utilized where appropriate.
+
+=== Medical records ===
+
+==== Administrative law ====
+Disabled persons under the age of 65 years can be eligible for disability benefits under Social Security Titles II and XVI.
+The seminal case in Social Security law is Richardson v. Perales, a Supreme Court decision from 1971.  The court directed that medical reports put forth by a treating physician in Social Security hearings should be accepted as evidence, despite the hearsay nature of the medical records. These should be accepted, even if cross-examination is not available.  The claimant has the right to subpoena the treating physician.  In cases of conflicting medical evidence, it is not unconstitutional for the hearing officer to obtain independent medical advice to help resolve the physical questions involved.  Under the Administrative Procedure Act, hearsay in the form of medical records are admissible up to the point of relevancy.
+Several federal agencies have adopted Jencks Act rules.  Although the Jencks Act applies only to government agents or employees who testify in criminal cases, making these witnesses and relevant documents available for cross-examination after testimony, it has been applied in administrative law cases in the interests of justice and fair play.   The party of record must make an official request to the hearing officer to have Jencks rules followed.  Some agency rules such as National Labor Relations Board automatically follow Jencks Act requirements.
+
+==== Medical malpractice actions ====
+In a case of alleged negligence by a physician, written summaries of the case by physicians provided to the insurance carrier or other parties can be the subject of a subpoena duces tecum, if, in the opinion of the court, they are relevant to the plaintiff's case.  Claims that these statements are "work product" will generally fail.
+Medical records form the core of any medical malpractice case.  Actions for malpractice are controlled by the general rules of evidence in civil procedure.  A malpractice action necessarily involves the question of requisite care and skill applied in a medical case.  With the exception of res ipsa loquitur cases, medical opinion about the care is essential.  This involves the necessity to obtain a subpoena duces tecum for medical records.
+Admission of "learned treatises" (published books and medical articles) at trial varies from jurisdiction to jurisdiction.  Some require that the expert admit it is an authoritative reference. Others will allow admission of learned treatises by judicial notice.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-4.md b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-4.md
new file mode 100644
index 000000000..cc0a19e14
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-4.md
@@ -0,0 +1,35 @@
+---
+title: "Subpoena duces tecum"
+chunk: 5/6
+source: "https://en.wikipedia.org/wiki/Subpoena_duces_tecum"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:23.289503+00:00"
+instance: "kb-cron"
+---
+
+==== Experts and opinion evidence ====
+In tort actions for recovery of damages, it is necessary for the introduction of medical records to establish a basis for the claimed loss.  An injured plaintiff is entitled to recover the expenses necessary to cure or treat injuries.  Courts frequently call upon expert testimony to interpret and advise, after examining medical records concerning the nature of injuries, future medical, disability and other issues before the court.
+
+==== Worker's Compensation actions ====
+Medical records introduced as evidence are crucial in determining both causation and impairment in worker's compensation cases.  In cases where the evidence is contested, medical evidence in the form of records, opinions, affidavits and testimony concerning both fact and opinion is necessary.  When oral testimony is taken from physicians, the usual standard is to state an opinion "within a reasonable degree of medical certainty".  Worker's compensation laws are dictated by state statute or Federal Employers Liability Act.  In many states, the employer has the right to demand an independent examination and can also direct treatment be carried out by certain physicians.
+
+==== Mandatory reporting of child abuse ====
+In the landmark 1976 California case of Landeros v. Flood, the California Supreme Court remanded a case to the trial court for action in tort against a treating physician for failure to report suspected child abuse.  The theory at trial was that the plaintiff, a child of about 12 months of age, had been returned to a home where further physical abuse occurred, causing more damages.  This was because the physician had failed to report the abuse in violation of California law.  After this case, all states instituted mandatory reporting by physicians and other medical personnel of any suspected child abuse or neglect cases. In general, reporting in good faith shields the physician or health care worker from tort liability. Reporting to police or social services necessitates obtaining medical records by subpoena duces tecum. This case, and legislation that followed it were in response to several articles that appeared in the medical literature that defined battered child syndrome and child abuse syndrome.
+The 1962 Social Security Amendments  require each state to make child welfare services available throughout the state to all children and provide coordination between child welfare services (Title IV-B) and social services provided under the Aid to Families with Dependent Children Act (ADC, later known as AFDC; now called Title XX) Determinations in these cases frequently require production of medical records.
+In 1972, Congressional hearings began on child abuse and neglect.  In response, Congress passed the Child Abuse Prevention and Treatment Act,  which defined abuse as "physical or mental injury, negligent treatment, or maltreatment of a child under the age of 18 by a person who is responsible for the child's welfare under circumstances which would indicate that the child's health or welfare is harmed or threatened thereby". The legislation created the National Center on Child Abuse and Neglect as an information clearinghouse.
+The Child Abuse Prevention and Treatment Act of 1974 (42 U.S.C. § 5101 – 42 U.S.C. § 5106) defined "child abuse and neglect" as "physical or mental injury, sexual abuse, negligent treatment, or maltreatment of a child under the age of eighteen by a person responsible for the child's welfare under circumstances which indicate that the child's health or welfare is harmed or threatened thereby."
+The Child Abuse Prevention and Treatment Act of 1988  when enacted, expanded the definition of abuse. Sexual crimes were specifically identified in Sex Crimes Against Children Act of 1995  These laws have made child abuse a federal crime, and routinely mandate production of medical records.
+
+==== Mandatory reporting of wounds and injuries ====
+Physician-patient privilege is defined and limited by statute.  Many jurisdictions have mandatory reporting laws requiring treating physicians or other medical personnel to report any suspicious injury to police or other appropriate authorities.  These requirements may be imposed by statute, ordinance or regulation.  Some of these may be limited to wounds typically inflicted by gun or knife.  There may be similar reporting requirements in cases of domestic violence.  These statutes have been generally upheld to constitutional challenges.  Reporting of such cases usually voids any challenge to subpoena duces tecum of the medical records by police or state authorities.
+
+==== Peer review records in medical licensing and hospital credential actions ====
+The issue of removal of a doctor from a hospital staff, or revoking or limiting a license to practice medicine usually involve various state and federal immunities. The Healthcare Quality Improvement Act (HCQIA) of 1986 granted doctors sitting on peer review committees immunity from subpoena duces tecum, or liability for the revocation of hospital privileges of other doctors. The matters of peer review cannot, in the normal course of events, be the subject of a subpoena duces tecum. This has led to claims that powerful doctors can abuse the process to punish other doctors for reasons unrelated to medical issues (termed "sham peer review").
+The American Medical Association conducted a probe of the sham peer review issue and found that no pervasive problem exists. Allegations of sham peer review are easy to make (for example, by doctors whose medical mistakes have made them targets of peer review), but actual infractions are rare. Opponents of peer review counter that the sparcity of successful challenges is indicative of how widespread the problem is and how difficult these actions are to win.
+
+== See also ==
+
+== References ==
+
+=== Notes ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-5.md b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-5.md
new file mode 100644
index 000000000..ba92bc3cf
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subpoena_duces_tecum-5.md
@@ -0,0 +1,100 @@
+---
+title: "Subpoena duces tecum"
+chunk: 6/6
+source: "https://en.wikipedia.org/wiki/Subpoena_duces_tecum"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:23.289503+00:00"
+instance: "kb-cron"
+---
+
+=== Sources ===
+11 USCS section 107 (a)
+Federal Rule 27 (a) (3)
+FRCP 30 (b) (5)
+FRCP 34
+FRCP 69 (a)
+Caffey, "Multiple Fractures in the Long Bones of Infants Suffering from Chronic Subdural Hematoma", 56 Am. J. Roentgen 163 (1946)
+Caffey, "Some Traumatic Lesions in Growing Bones Other Than Fractures and Dislocation – Clinical and Radiological Features", 30 Br. J. Radiol. 225, 1957
+Kempe, "The Battered Child Syndrome", Journal of the American Medical Syndrome", 181, July 7, 1962
+Malone, Plant and Little, "Worker's Compensation and Employment Rights", West, 1980
+Pegalis, S. and Wachsman, H., "American Law of Medical Malpractice", Lawyers Cooperative, Bancroft Whitney, 1980
+Sharpe, D., Fiscina, S. and Head, M., "Law and Medicine" West, 1978
+Stein, J., "Damages and Recovery, Personal Injury and Death Actions", Lawyers Cooperative, Bancroft Whitney, 1972
+
+==== American jurisprudence ====
+2 Am Jur 2nd "Administrative Law", section 328 (Jencks Act)
+9 Am Jur 2nd "Bankruptcy", section 829, 828–829
+16 A Am Jur 2nd "Constitutional Law", section 738
+17 Am Jur 2nd "Continuance", sections 20, 81
+21 A Am Jur 2nd "Criminal Law", section 666 et seq; 876 et seq
+23 Am Jur 2nd "Depositions and Discovery", sections 126–127
+29 A Am Jur 2nd  "Evidence", sections 1416–1420
+30 Am Jur 2nd "Executions, Etc.", sections 720, 714, 722
+31 A Am Jur 2nd "Expert and Opinion Evidence" sections 127–277
+36 Am Jur 2nd "Foreign Corporations" sections 4–45
+39 Am Jur 2nd "Habeas Corpus", section 97
+52 Am Jur 2nd  "Mandamus", section 314, 367
+54 Am Jur 2nd "Monopolies", sections 394, 398–399, 836, 840, 862
+61 Am Jur 2nd "Physicians, Surgeons, Etc."  sections 200–377
+70 A Am Jur 2nd "Social Security and Medicare", sections 468 et seq
+75 AM Jur 2nd "Trial", sections 205–216
+79 Am Jur 2nd "Welfare", section 50
+81 Am Jur 2nd "Witnesses", section 79, 172 et seq
+82 Am Jur 2nd "Worker's Compensation", sections 504 et seq
+
+==== American law reports ====
+48 ALR Fed 259
+49 ALR Fed 674
+64 ALR Fed 971 (learned treatises)
+10 ALR 1152
+41 ALR 433 (mandamus)
+49 ALR 732
+77 ALR 1490
+112 ALR 438 (mandamus)
+120 ALR 1103
+128 ALR 682
+151 ALR 475
+76 ALR 2nd 946
+88 ALR 2nd 650
+90 ALR 2nd 1323
+2 ALR 3rd 286
+9 ALR 3rd 1413
+14 ALR 3rd 594
+21 ALR 3rd 912 (workers' comp discovery)
+44 ALR 3rd 24
+55 ALR 3rd 1322
+59 ALR 3rd 441
+61 ALR 3rd 1297
+81 ALR 3rd 1297 section 3 (b), 8 (a), 9(a)
+85 ALR 3rd 1196 (mandatory reporting of suspicious wounds)
+97 ALR 3rd 324 (Landeros v. Flood)
+1 ALR 4th 1124
+22 ALR 4th 774
+
+==== Proof of facts ====
+2 Proof of Facts 2nd 365 et seq (child abuse)
+3 Proof of Facts 2nd 265 et seq (child abuse)
+6 Proof of Facts 2nd 345 et seq (child abuse)
+
+==== Case law citation ====
+Barron v. Florida Freedom Newspapers Inc., (Fla) 531 So 2nd 113, 13 FLW 497, 15 Media LR 1901
+Barsky v. Board of Regents, Supreme Court of the United States, 1954, 347, US 442, 74 S. Ct. 650, 98 L. Ed. 829
+Butler v. Doyle, Supreme Court of Arizona, 112 Ariz. 522, 544 P. 2nd 204
+Colorado State Board of Medical Examiners v. District Court, 191 Colo. –, 551, P. 2nd 194 (1976)
+Continental Oil Co. v. United States (Ca 9 Ariz) 330 F 2nd 347 reprinted in 9 ALR 3rd 1413
+Ex Parte Clarke, 126 Cal, 235, 58 P 546
+Fairbank v. hardin (CA 9) 429, F2d 264, cert edn 400 US 943, 27 L Ed 2nd 247, 91 S. Ct. 244
+Globe Newspaper Co. v. Superior Court of County of Norfolk, 457 US 596, 73 L ED 2nd 248, 102 S. Ct. 2613, 8 Media LR 1689
+In Re Parsons, 150 US 150, 37, 1, L Ed 1034, 14 US Supreme Court, 50
+International Harvester Co. v. Eaton Circuit Judge, 163 Mich 5, 127 NW 695
+Jencks v. United States, 355, US 657 (1957)
+Klinge v. Lutheran Charites Ass'n, United States Court of Appeals for the Eighth Circuit, 1975, 523 F. 2nd 56
+Landeros v. Flood 17 Cal. 3rd 399, 131, Cal. Reporter, 69, 551 P.2nd 389
+Matchett v. Superior Court, 40 Cal. App. 3rd, 623, 115 Cal. Reporter 317 (1974)
+Oxnard Publishing Co. v. Superior Court of Ventura County (Cal App) 68 Cal Reporter 83
+Perales v. Richardson 91 A SCR 1420, 1971
+Press-Enterprise Co. v. Superior Court of California, 478 US 1, 92 L Ed 2nd 1, 106 S. Ct. 2735, 13 Media LR 1001
+Re Iowa Freedom of Information Council (CA Iowa) 724 F 2nd 658, 10 Media LR 1120;
+Rosenthal v. Dickerman, 98 Mich 208, 57, NW 112
+Smith v. Superior Court of San Joaquin County, 189 Cal.App.2d 6; 1 Cal Reporter reprinted in 88 ALR 2nd 650
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Symbolic_data_analysis-0.md b/data/en.wikipedia.org/wiki/Symbolic_data_analysis-0.md
new file mode 100644
index 000000000..9e6cab282
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Symbolic_data_analysis-0.md
@@ -0,0 +1,24 @@
+---
+title: "Symbolic data analysis"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Symbolic_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:15.432627+00:00"
+instance: "kb-cron"
+---
+
+Symbolic data analysis (SDA) is an extension of standard data analysis where symbolic data tables are used as input and symbolic objects are made output as a result. The data units are called symbolic since they are more complex than standard ones, as they not only contain values or categories, but also include internal variation and structure. SDA is based on four spaces: the space of individuals, the space of concepts, the space of descriptions, and the space of symbolic objects. The space of descriptions models individuals, while the space of symbolic objects models concepts.
+
+
+== References ==
+
+
+== Further reading ==
+Diday, Edwin; Noirhomme-Fraiture, Monique (2008). Symbolic Data Analysis and the SODAS Software. Wiley–Blackwell. ISBN 9780470018835.
+
+
+== External links ==
+Symbolic Data Analysis: Conceptual Statistics and Data Mining
+An introduction to symbolic data analysis and its Application to the Sodas Project by Edwin Diday
+R2S: An R package to transform relational data into symbolic data
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Task_Force_on_Process_Mining-0.md b/data/en.wikipedia.org/wiki/Task_Force_on_Process_Mining-0.md
new file mode 100644
index 000000000..85b974a09
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Task_Force_on_Process_Mining-0.md
@@ -0,0 +1,35 @@
+---
+title: "Task Force on Process Mining"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Task_Force_on_Process_Mining"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:16.578150+00:00"
+instance: "kb-cron"
+---
+
+The IEEE Task Force on Process Mining (TFPM) is a non-commercial association for process mining. The IEEE (Institute of Electrical and Electronics Engineers) Task Force on Process Mining was established in October 2009 as part of the IEEE Computational Intelligence Society at the Eindhoven University of Technology. 
+The task force is supported by over 80 organizations and has around 750 members. The main goal of the task force is to promote the research, development, education, and understanding of process mining.
+
+
+== About ==
+In 2012, the IEEE World Congress on Computational Intelligence/ IEEE Congress on Evolutionary Computation held a session on Process Mining. Process mining is a type of research that is a mix of computational intelligence and data mining, as well as process modeling and analysis.
+
+
+=== Activities and organization ===
+The Task Force on Process Mining has a Steering Committee and an Advisory Board. The Steering Committee, was chaired by Wil van der Aalst in its inception in 2009, defined 15 action lines. These include the organization of the annual International Process Mining Conference (ICPM) series, standardization efforts leading to the IEEE XES standard for storing and exchanging event data, and the Process Mining Manifesto which was translated into 16 languages. The Task Force on Process Mining also publishes a newsletter, provides data sets, organizes workshops and competitions, and connects researchers and practitioners.
+In 2016, the IEEE Standards Association published the IEEE Standard for Extensible Event Stream (XES), which is a widely accepted file format by the process mining community.
+As of 2023, Boudewijn van Dongen serves as chair of the Steering Committee. Wil van der Aalst and Moe Wynn both serve as vice-chair of the Steering Committee. 
+
+
+== See also ==
+Process mining
+Business process management
+
+
+== References ==
+
+
+== Further reading ==
+Aalst, W. van der (2016). Process Mining: Data Science in Action. Springer Verlag, Berlin (ISBN 978-3-662-49850-7).
+Reinkemeyer, L. (2020). Process Mining in Action: Principles, Use Cases and Outlook. Springer Verlag, Berlin (ISBN 978-3-030-40171-9).
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Tempore-0.md b/data/en.wikipedia.org/wiki/Tempore-0.md
new file mode 100644
index 000000000..64d48ae4a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Tempore-0.md
@@ -0,0 +1,16 @@
+---
+title: "Tempore"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Tempore"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:44.514312+00:00"
+instance: "kb-cron"
+---
+
+Tempore (abbreviated to temp.) in historical literature, denotes a period during which a person whose exact lifespan is unknown, was known to have been alive or active, or some other date which is not exactly known, usually given as the reign of a monarch. The word is Latin, being the ablative singular of the noun tempus, temporis, "time", thus meaning "in the time (of)". It should be followed by a name in the genitive case. The theoretical full form might be vixit tempore Regis Henrici Primi ("they lived in the time of King Henry the First"; i.e. 1100–1135).
+The best-known occurrence is in the Domesday Book of 1086, where the phrase Tempore Regis Eduardi (nominative case Rex Eduardus), meaning "in the time of King Edward (the Confessor)" appears in the entry for almost every manor, abbreviated as TRE. It thus signifies the date range 1042–1066. It is useful in historical literature because the names of many historical persons appear in surviving documents only in royal charters, possibly as witnesses, which can be dated to the reign of the originating monarch.
+The word tempore is often given in its abbreviated form temp. It is similar to floruit ("flourished" [at a date or range of dates]), which however is more appropriate for artists to denote not merely a period of life, but a particularly productive period within that lifespan.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Terminus_post_quem-0.md b/data/en.wikipedia.org/wiki/Terminus_post_quem-0.md
new file mode 100644
index 000000000..cae2d289e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Terminus_post_quem-0.md
@@ -0,0 +1,34 @@
+---
+title: "Terminus post quem"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Terminus_post_quem"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:45.704663+00:00"
+instance: "kb-cron"
+---
+
+A terminus post quem ('limit after which', sometimes abbreviated TPQ) and terminus ante quem ('limit before which', abbreviated TAQ) specify the known limits of dating for events or items.
+A terminus post quem is the earliest date the event may have happened or the item was in existence, and a terminus ante quem is the latest.  An event may well have both a terminus post quem and a terminus ante quem, in which case the limits of the possible range of dates are known at both ends, but many events have just one or the other.  Similarly, a terminus ad quem 'limit to which' is the latest possible date of a non-punctual event (period, era, etc.), whereas a terminus a quo 'limit from which' is the earliest. The concepts are similar to those of upper and lower bounds in mathematics.
+These terms are often used in archaeological and historical studies, such as dating layers in excavated sites, coins, historical events, authors, inscriptions or texts where the exact dates may not be known or may be in dispute.
+
+
+== Example ==
+
+For example, consider an archaeological find of a burial that contains coins dating to 1588, 1595, and others less securely dated to 1590–1625.  The terminus post quem for the burial would be the latest date established with certainty: in this case, 1595. A secure dating of an older coin to an earlier date would not shift the terminus post quem, while securing the later date of 1625 would make that date the  terminus post quem.
+An archaeological example of a terminus ante quem would be deposits formed before a historically dateable event, such as building foundations that were partly demolished to make way for the construction of a city wall. If it is known that the wall was finished in 650, then the foundations must have been demolished in 650 or earlier; all that can be said from the evidence is that it happened before the known event.
+Other examples of things that may establish a terminus are known dates of death or travel by persons involved, a particular form of heraldry that can be dated (see pastiglia for example), references to reigning monarchs or office-holders, or a placing relative to any other events whose date is securely known. In a modern context, dated images, such as those available in Google Earth, may establish termini.
+
+
+== Related terms ==
+A terminus ante quem non differs from a terminus post quem by not implying the event necessarily took place.  'Event E happened after time T' implies E occurred, whereas 'event E did not happen before time T' leaves open the possibility that E never occurred at all.
+In project planning, sometimes the phrases "no earlier than" / "no later than" (NET/NLT) are used.
+
+
+== See also ==
+Interval (time)
+List of Latin phrases
+Relative dating
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Thermoluminescence_dating-0.md b/data/en.wikipedia.org/wiki/Thermoluminescence_dating-0.md
new file mode 100644
index 000000000..344c4f95f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Thermoluminescence_dating-0.md
@@ -0,0 +1,36 @@
+---
+title: "Thermoluminescence dating"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Thermoluminescence_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:46.871512+00:00"
+instance: "kb-cron"
+---
+
+Thermoluminescence dating (TL) is the determination, by means of measuring the accumulated radiation dose, of the time elapsed since material containing crystalline minerals was either heated (lava, ceramics) or exposed to sunlight (sediments). As a crystalline material is heated during measurements, the process of  thermoluminescence starts. Thermoluminescence emits a weak light signal that is proportional to the radiation dose absorbed by the material. It is a type of luminescence dating.
+The technique has wide application, and is relatively cheap at some US$300–700 per object; ideally a number of samples are tested.  Sediments are more expensive to date. The destruction of a relatively significant amount of sample material is necessary, which can be a limitation in the case of artworks. The heating must have taken the object above 500 °C, which covers most ceramics, although very high-fired porcelain creates other difficulties.  It will often work well with stones that have been heated by fire. The clay core of bronze sculptures made by lost wax casting is also able to be tested.
+Different materials vary considerably in their suitability for the technique, depending on several factors.  Subsequent irradiation, for example if an x-ray is taken, can affect accuracy, as will the "annual dose" of radiation a buried object has received from the surrounding soil. Ideally this is assessed by measurements made at the precise findspot over a long period. For artworks, it may be sufficient to confirm whether a piece is broadly ancient or modern (that is, authentic or a fake), and this may be possible even if a precise date cannot be estimated.
+
+== Functionality ==
+Natural crystalline materials contain imperfections: impurity ions, stress dislocations, and other phenomena that disturb the regularity of the electric field that holds the atoms in the crystalline lattice together.  These imperfections lead to local humps and dips in the crystalline material's electric potential. Where there is a dip (a so-called "electron trap"), a free electron may be attracted and trapped.
+The flux of ionizing radiation—both from cosmic radiation and from natural radioactivity—excites electrons from atoms in the crystal lattice into the conduction band where they can move freely.  Most excited electrons will soon recombine with lattice ions, but some will be trapped, storing part of the energy of the radiation in the form of trapped electric charge (Figure 1).
+Depending on the depth of the traps (the energy required to free an electron from them) the storage time of trapped electrons will vary as some traps are sufficiently deep to store charge for hundreds of thousands of years.
+
+== In practical use ==
+Another important technique in testing samples from a historic or archaeological site is a process known as thermoluminescence testing, which involves the principle that all
+objects absorb radiation from the environment. This process frees electrons within elements or minerals that remain caught within the item. Thermoluminescence testing involves
+heating a sample until it releases a type of light, which is then measured to determine the last time the item was heated.
+In thermoluminescence dating, these long-term traps are used to determine the age of materials: When irradiated crystalline material is again heated or exposed to strong light, the trapped electrons are given sufficient energy to escape. In the process of recombining with a lattice ion, they lose energy and emit photons (light quanta), detectable in the laboratory.
+The amount of light produced is proportional to the number of trapped electrons that have been freed which is in turn proportional to the radiation dose accumulated. In order to relate the signal (the thermoluminescence—light produced when the material is heated) to the radiation dose that caused it, it is necessary to calibrate the material with known doses of radiation since the density of traps is highly variable.
+Thermoluminescence dating presupposes a "zeroing" event in the history of the material, either heating (in the case of pottery or lava) or exposure to sunlight (in the case of sediments), that removes the pre-existing trapped electrons. Therefore, at that point the thermoluminescence signal is zero.
+As time goes on, the ionizing radiation field around the material causes the trapped electrons to accumulate (Figure 2). In the laboratory, the accumulated radiation dose can be measured, but this by itself is insufficient to determine the time since the zeroing event.
+The Radiation Dose Rate - the dose accumulated per year-must be determined first. This is commonly done by measurement of the alpha radioactivity (the uranium and thorium content) and the potassium content (K-40 is a beta and gamma emitter) of the sample material.
+Often the gamma radiation field at the position of the sample material is measured, or it may be calculated from the alpha radioactivity and potassium content of the sample environment, and the cosmic ray dose is added in.  Once all components of the radiation field are determined, the accumulated dose from the thermoluminescence measurements is divided by the dose accumulating each year, to obtain the years since the zeroing event.
+
+== Relation to radiocarbon dating ==
+Thermoluminescence dating is used for material where radiocarbon dating is not available, like sediments. Its use is now common in the authentication of old ceramic wares, for which it gives the approximate date of the last firing. An example of this can be seen in Rink and Bartoll, 2005.
+Thermoluminescence dating was modified for use as a passive sand migration analysis tool by Keizars, et al., 2008 (Figure 3), demonstrating the direct consequences resulting from the improper replenishment of starving beaches using fine sands, as well as providing a passive method of policing sand replenishment and observing riverine or other sand inputs along shorelines (Figure 4).
+
+== Relation to other luminescence dating methods ==
+Optically stimulated luminescence dating is a related measurement method which replaces heating with exposure to intense light.  The sample material is illuminated with a very bright source of green or blue light (for quartz) or infrared light (for potassium feldspar).  Ultraviolet light emitted by the sample is detected for measurement.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Thermoluminescence_dating-1.md b/data/en.wikipedia.org/wiki/Thermoluminescence_dating-1.md
new file mode 100644
index 000000000..1ebba9dd3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Thermoluminescence_dating-1.md
@@ -0,0 +1,33 @@
+---
+title: "Thermoluminescence dating"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Thermoluminescence_dating"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:46.871512+00:00"
+instance: "kb-cron"
+---
+
+== See also ==
+Geochronology
+Luminescence dating
+Rehydroxylation dating
+Thermoluminescent dosimeter
+
+== Notes ==
+Oxford Authentication: Home - TL Testing Authentication 'Oxford Authentication® Ltd authenticates ceramic antiquities using the scientific technique of thermoluminescence (TL). TL testing is a dating method for archaeological items which can distinguish between genuine and fake antiquities.' See some of their case studies here: https://www.oxfordauthentication.com/case-studies/
+
+== References and bibliography ==
+GlobalNet.co.uk, Quaternary TL Surveys - Guide to thermoluminescence date measurement
+Aitken, M.J., Thermoluminescence Dating, Academic Press, London (1985) – Standard text for introduction to the field. Quite complete and rather technical, but well written and well organized. There is a second edition.
+Aitken, M.J., Introduction to Optical Dating, Oxford University Press (1998) – Good introduction to the field.
+Keizars, K.Z. 2003. NRTL as a method of analysis of sand transport along the coast of the St. Joseph Peninsula, Florida. GAC/MAC 2003. Presentation: Brock University, St. Catharines, Ontario, Canada.
+JCRonline.org, Ķeizars, Z., Forrest, B., Rink, W.J. 2008. Natural Residual Thermoluminescence as a Method of Analysis of Sand Transport along the Coast of the St. Joseph Peninsula, Florida. Journal of Coastal Research, 24: 500–507.
+Keizars, Z. 2008b. NRTL trends observed in the sands of St. Joseph Peninsula, Florida. Queen's University. Presentation: Queen's University, Kingston, Ontario, Canada.
+Liritzis, I., 2011. Surface Dating by Luminescence: An Overview. Geochronometria, 38(3): 292–302.
+Mortlock, AJ; Price, D and Gardiner, G. The Discovery and Preliminary Thermoluminescence Dating of Two Aboriginal Cave Shelters in the Selwyn Ranges, Queensland [online]. Australian Archaeology, No. 9, Nov 1979: 82–86. Availability: <[1]> ISSN 0312-2417. [cited 04 Feb 15].
+Antiquity.ac.uk, Rink, W. J., Bartoll, J. 2005. Dating the geometric Nasca lines in the Peruvian desert. Antiquity, 79: 390–401.
+Sullasi, H. S., Andrade, M. B., Ayta, W. E. F., Frade, M., Sastry, M. D., & Watanabe, S. (2004). Irradiation for dating Brazilian fish fossil by thermoluminescence and EPR technique. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 213, 756–760.doi:10.1016/S0168-583X(03)01698-7
+
+== External links ==
+Brief introduction on TL technique - Link no longer valid (Oct 2022)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Timeline_of_snowflake_research-0.md b/data/en.wikipedia.org/wiki/Timeline_of_snowflake_research-0.md
index 1c5564288..4fb4304f6 100644
--- a/data/en.wikipedia.org/wiki/Timeline_of_snowflake_research-0.md
+++ b/data/en.wikipedia.org/wiki/Timeline_of_snowflake_research-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Timeline_of_snowflake_research"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:35:41.253572+00:00"
+date_saved: "2026-05-05T09:56:55.831024+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-0.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-0.md
new file mode 100644
index 000000000..b29beff70
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-0.md
@@ -0,0 +1,22 @@
+---
+title: "Topological data analysis"
+chunk: 1/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise.  Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.
+The initial motivation is to study the shape of data. TDA has combined algebraic topology and other tools from pure mathematics to allow mathematically rigorous study of "shape". The main tool is persistent homology, an adaptation of homology to point cloud data. Persistent homology has been applied to many types of data across many fields. Moreover, its mathematical foundation is also of theoretical importance. The unique features of TDA make it a promising bridge between topology and geometry.
+
+== Basic theory ==
+
+=== Intuition ===
+TDA is premised on the idea that the shape of data sets contains relevant information. Real high-dimensional data is typically sparse, and tends to have relevant low dimensional features. One task of TDA is to provide a precise characterization of this fact. For example, the trajectory of a simple predator-prey system governed by the Lotka–Volterra equations forms a closed circle in state space. TDA provides tools to detect and quantify such recurrent motion.
+Many algorithms for data analysis, including those used in TDA, require setting various parameters. Without prior domain knowledge, the correct collection of parameters for a data set is difficult to choose. The main insight of persistent homology is to use the information obtained from all parameter values by encoding this huge amount of information into an understandable and easy-to-represent form. With TDA, there is a mathematical interpretation when the information is a homology group. In general, the assumption is that features that persist for a wide range of parameters are "true" features. Features persisting for only a narrow range of parameters are presumed to be noise, although the theoretical justification for this is unclear.
+
+=== Early history ===
+Precursors to the full concept of persistent homology appeared gradually over time. In 1990, Patrizio Frosini introduced a pseudo-distance between submanifolds, and later the size function, which on 1-dim curves is equivalent to the 0th persistent homology. Nearly a decade later, Vanessa Robins studied the images of homomorphisms induced by inclusion. Finally, shortly thereafter, Herbert Edelsbrunner et al. introduced the concept of persistent homology together with an efficient algorithm and its visualization as a persistence diagram. Gunnar Carlsson et al. reformulated the initial definition and gave an equivalent visualization method called persistence barcodes, interpreting persistence in the language of commutative algebra.
+In algebraic topology the persistent homology has emerged through the work of Sergey Barannikov on Morse theory. The set of critical values of smooth Morse function was canonically partitioned into pairs "birth-death", filtered complexes were classified, their invariants, equivalent to persistence diagram and persistence barcodes, together with the efficient algorithm for their calculation, were described under the name of canonical forms in 1994 by Barannikov.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-1.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-1.md
new file mode 100644
index 000000000..06e511c86
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-1.md
@@ -0,0 +1,1022 @@
+---
+title: "Topological data analysis"
+chunk: 2/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+=== Concepts ===
+Some widely used concepts are introduced below. Note that some definitions may vary from author to author.
+A point cloud is often defined as a finite set of points in some Euclidean space, but may be taken to be any finite metric space.
+The Čech complex of a point cloud is the nerve of the cover of balls of a fixed radius around each point in the cloud.
+A persistence module 
+  
+    
+      
+        
+          U
+        
+      
+    
+    {\displaystyle \mathbb {U} }
+  
+ indexed by 
+  
+    
+      
+        
+          Z
+        
+      
+    
+    {\displaystyle \mathbb {Z} }
+  
+ is a vector space 
+  
+    
+      
+        
+          U
+          
+            t
+          
+        
+      
+    
+    {\displaystyle U_{t}}
+  
+ for each 
+  
+    
+      
+        t
+        ∈
+        
+          Z
+        
+      
+    
+    {\displaystyle t\in \mathbb {Z} }
+  
+, and a linear map 
+  
+    
+      
+        
+          u
+          
+            t
+          
+          
+            s
+          
+        
+        :
+        
+          U
+          
+            s
+          
+        
+        →
+        
+          U
+          
+            t
+          
+        
+      
+    
+    {\displaystyle u_{t}^{s}\colon U_{s}\to U_{t}}
+  
+ whenever 
+  
+    
+      
+        s
+        ≤
+        t
+      
+    
+    {\displaystyle s\leq t}
+  
+, such that 
+  
+    
+      
+        
+          u
+          
+            t
+          
+          
+            t
+          
+        
+        =
+        1
+      
+    
+    {\displaystyle u_{t}^{t}=1}
+  
+ for all 
+  
+    
+      
+        t
+      
+    
+    {\displaystyle t}
+  
+ and 
+  
+    
+      
+        
+          u
+          
+            t
+          
+          
+            s
+          
+        
+        
+          u
+          
+            s
+          
+          
+            r
+          
+        
+        =
+        
+          u
+          
+            t
+          
+          
+            r
+          
+        
+      
+    
+    {\displaystyle u_{t}^{s}u_{s}^{r}=u_{t}^{r}}
+  
+ whenever 
+  
+    
+      
+        r
+        ≤
+        s
+        ≤
+        t
+        .
+      
+    
+    {\displaystyle r\leq s\leq t.}
+  
+ An equivalent definition is a functor from 
+  
+    
+      
+        
+          Z
+        
+      
+    
+    {\displaystyle \mathbb {Z} }
+  
+ considered as a partially ordered set to the category of vector spaces.
+The persistent homology group 
+  
+    
+      
+        P
+        H
+      
+    
+    {\displaystyle PH}
+  
+ of a point cloud is the persistence module defined as 
+  
+    
+      
+        P
+        
+          H
+          
+            k
+          
+        
+        (
+        X
+        )
+        =
+        ∏
+        
+          H
+          
+            k
+          
+        
+        (
+        
+          X
+          
+            r
+          
+        
+        )
+      
+    
+    {\displaystyle PH_{k}(X)=\prod H_{k}(X_{r})}
+  
+, where 
+  
+    
+      
+        
+          X
+          
+            r
+          
+        
+      
+    
+    {\displaystyle X_{r}}
+  
+ is the Čech complex of radius 
+  
+    
+      
+        r
+      
+    
+    {\displaystyle r}
+  
+ of the point cloud 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ and 
+  
+    
+      
+        
+          H
+          
+            k
+          
+        
+      
+    
+    {\displaystyle H_{k}}
+  
+ is the homology group.
+A persistence barcode is a multiset of intervals in 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+, and a persistence diagram is a multiset of points in 
+  
+    
+      
+        Δ
+      
+    
+    {\displaystyle \Delta }
+  
+(
+  
+    
+      
+        :=
+        {
+        (
+        u
+        ,
+        v
+        )
+        ∈
+        
+          
+            R
+          
+          
+            2
+          
+        
+        ∣
+        u
+        ,
+        v
+        ≥
+        0
+        ,
+        u
+        ≤
+        v
+        }
+      
+    
+    {\displaystyle :=\{(u,v)\in \mathbb {R} ^{2}\mid u,v\geq 0,u\leq v\}}
+  
+).
+The Wasserstein distance between two persistence diagrams 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ and 
+  
+    
+      
+        Y
+      
+    
+    {\displaystyle Y}
+  
+ is defined as 
+  
+    
+      
+        
+          W
+          
+            p
+          
+        
+        [
+        
+          L
+          
+            q
+          
+        
+        ]
+        (
+        X
+        ,
+        Y
+        )
+        :=
+        
+          inf
+          
+            φ
+            :
+            X
+            →
+            Y
+          
+        
+        
+          
+            [
+            
+              
+                ∑
+                
+                  x
+                  ∈
+                  X
+                
+              
+              (
+              ‖
+              x
+              −
+              φ
+              (
+              x
+              )
+              
+                ‖
+                
+                  q
+                
+              
+              
+                )
+                
+                  p
+                
+              
+            
+            ]
+          
+          
+            1
+            
+              /
+            
+            p
+          
+        
+      
+    
+    {\displaystyle W_{p}[L_{q}](X,Y):=\inf _{\varphi :X\to Y}\left[\sum _{x\in X}(\Vert x-\varphi (x)\Vert _{q})^{p}\right]^{1/p}}
+  
+where 
+  
+    
+      
+        1
+        ≤
+        p
+        ,
+        q
+        ≤
+        ∞
+      
+    
+    {\displaystyle 1\leq p,q\leq \infty }
+  
+ and 
+  
+    
+      
+        φ
+      
+    
+    {\displaystyle \varphi }
+  
+ ranges over bijections between 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ and 
+  
+    
+      
+        Y
+      
+    
+    {\displaystyle Y}
+  
+. Please refer to figure 3.1 in Munch for illustration.
+The bottleneck distance between 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ and 
+  
+    
+      
+        Y
+      
+    
+    {\displaystyle Y}
+  
+ is 
+  
+    
+      
+        
+          W
+          
+            ∞
+          
+        
+        [
+        
+          L
+          
+            q
+          
+        
+        ]
+        (
+        X
+        ,
+        Y
+        )
+        :=
+        
+          inf
+          
+            φ
+            :
+            X
+            →
+            Y
+          
+        
+        
+          sup
+          
+            x
+            ∈
+            X
+          
+        
+        ‖
+        x
+        −
+        φ
+        (
+        x
+        )
+        
+          ‖
+          
+            q
+          
+        
+        .
+      
+    
+    {\displaystyle W_{\infty }[L_{q}](X,Y):=\inf _{\varphi :X\to Y}\sup _{x\in X}\Vert x-\varphi (x)\Vert _{q}.}
+  
+ This is a special case of Wasserstein distance, letting 
+  
+    
+      
+        p
+        =
+        ∞
+      
+    
+    {\displaystyle p=\infty }
+  
+.
+
+=== Basic property ===
+
+==== Structure theorem ====
+The first classification theorem for persistent homology appeared in 1994 via Barannikov's canonical forms. The classification theorem interpreting persistence in the language of commutative algebra appeared in 2005: for a finitely generated persistence module 
+  
+    
+      
+        C
+      
+    
+    {\displaystyle C}
+  
+ with field 
+  
+    
+      
+        F
+      
+    
+    {\displaystyle F}
+  
+ coefficients, 
+
+  
+    
+      
+        H
+        (
+        C
+        ;
+        F
+        )
+        ≃
+        
+          ⨁
+          
+            i
+          
+        
+        
+          x
+          
+            
+              t
+              
+                i
+              
+            
+          
+        
+        ⋅
+        F
+        [
+        x
+        ]
+        ⊕
+        
+          (
+          
+            
+              ⨁
+              
+                j
+              
+            
+            
+              x
+              
+                
+                  r
+                  
+                    j
+                  
+                
+              
+            
+            ⋅
+            (
+            F
+            [
+            x
+            ]
+            
+              /
+            
+            (
+            
+              x
+              
+                
+                  s
+                  
+                    j
+                  
+                
+              
+            
+            ⋅
+            F
+            [
+            x
+            ]
+            )
+            )
+          
+          )
+        
+        .
+      
+    
+    {\displaystyle H(C;F)\simeq \bigoplus _{i}x^{t_{i}}\cdot F[x]\oplus \left(\bigoplus _{j}x^{r_{j}}\cdot (F[x]/(x^{s_{j}}\cdot F[x]))\right).}
+  
+
+Intuitively, the free parts correspond to the homology generators that appear at filtration level 
+  
+    
+      
+        
+          t
+          
+            i
+          
+        
+      
+    
+    {\displaystyle t_{i}}
+  
+ and never disappear, while the torsion parts correspond to those that appear at filtration level 
+  
+    
+      
+        
+          r
+          
+            j
+          
+        
+      
+    
+    {\displaystyle r_{j}}
+  
+ and last for 
+  
+    
+      
+        
+          s
+          
+            j
+          
+        
+      
+    
+    {\displaystyle s_{j}}
+  
+ steps of the filtration (or equivalently, disappear at filtration level 
+  
+    
+      
+        
+          s
+          
+            j
+          
+        
+        +
+        
+          r
+          
+            j
+          
+        
+      
+    
+    {\displaystyle s_{j}+r_{j}}
+  
+).
+Persistent homology is visualized through a barcode or persistence diagram. The barcode has its root in abstract mathematics. Namely, the category of finite filtered complexes over a field is semi-simple. Any filtered complex is isomorphic to its canonical form, a direct sum of one- and two-dimensional simple filtered complexes.
+
+==== Stability ====
+Stability is desirable because it provides robustness against noise. If 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ is any space which is homeomorphic to a simplicial complex, and 
+  
+    
+      
+        f
+        ,
+        g
+        :
+        X
+        →
+        
+          R
+        
+      
+    
+    {\displaystyle f,g:X\to \mathbb {R} }
+  
+ are continuous tame functions, then the persistence vector spaces 
+  
+    
+      
+        {
+        
+          H
+          
+            k
+          
+        
+        (
+        
+          f
+          
+            −
+            1
+          
+        
+        (
+        [
+        0
+        ,
+        r
+        ]
+        )
+        )
+        }
+      
+    
+    {\displaystyle \{H_{k}(f^{-1}([0,r]))\}}
+  
+ and 
+  
+    
+      
+        {
+        
+          H
+          
+            k
+          
+        
+        (
+        
+          g
+          
+            −
+            1
+          
+        
+        (
+        [
+        0
+        ,
+        r
+        ]
+        )
+        )
+        }
+      
+    
+    {\displaystyle \{H_{k}(g^{-1}([0,r]))\}}
+  
+ are finitely presented, and 
+  
+    
+      
+        
+          W
+          
+            ∞
+          
+        
+        (
+        D
+        (
+        f
+        )
+        ,
+        D
+        (
+        g
+        )
+        )
+        ≤
+        ‖
+        f
+        −
+        g
+        
+          ‖
+          
+            ∞
+          
+        
+      
+    
+    {\displaystyle W_{\infty }(D(f),D(g))\leq \lVert f-g\rVert _{\infty }}
+  
+, where 
+  
+    
+      
+        
+          W
+          
+            ∞
+          
+        
+      
+    
+    {\displaystyle W_{\infty }}
+  
+ refers to the bottleneck distance and 
+  
+    
+      
+        D
+      
+    
+    {\displaystyle D}
+  
+ is the map taking a continuous tame function to the persistence diagram of its 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+-th homology.
+
+=== Workflow ===
+The basic workflow in TDA is:
+
+If 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ is a point cloud, replace 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ with a nested family of simplicial complexes 
+  
+    
+      
+        
+          X
+          
+            r
+          
+        
+      
+    
+    {\displaystyle X_{r}}
+  
+ (such as the Čech or Vietoris-Rips complex). This process converts the point cloud into a filtration of simplicial complexes. Taking the homology of each complex in this filtration gives a persistence module 
+  
+    
+      
+        
+          H
+          
+            i
+          
+        
+        (
+        
+          X
+          
+            
+              r
+              
+                0
+              
+            
+          
+        
+        )
+        →
+        
+          H
+          
+            i
+          
+        
+        (
+        
+          X
+          
+            
+              r
+              
+                1
+              
+            
+          
+        
+        )
+        →
+        
+          H
+          
+            i
+          
+        
+        (
+        
+          X
+          
+            
+              r
+              
+                2
+              
+            
+          
+        
+        )
+        →
+        ⋯
+      
+    
+    {\displaystyle H_{i}(X_{r_{0}})\to H_{i}(X_{r_{1}})\to H_{i}(X_{r_{2}})\to \cdots }
+  
+
+Apply the structure theorem to obtain the persistent Betti numbers, persistence diagram, or equivalently, barcode.
+Graphically speaking, 
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-2.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-2.md
new file mode 100644
index 000000000..4e872cab5
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-2.md
@@ -0,0 +1,296 @@
+---
+title: "Topological data analysis"
+chunk: 3/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+== Computation ==
+The first algorithm over all fields for persistent homology in algebraic topology setting was described by Barannikov through reduction to the canonical form by upper-triangular matrices. The algorithm for persistent homology over 
+  
+    
+      
+        
+          F
+          
+            2
+          
+        
+      
+    
+    {\displaystyle F_{2}}
+  
+ was given by Edelsbrunner et al. Afra Zomorodian and Carlsson gave the practical algorithm to compute persistent homology over all fields. Edelsbrunner and Harer's book gives general guidance on computational topology.
+One issue that arises in computation is the choice of complex. The Čech complex and the Vietoris–Rips complex are most natural at first glance; however, their size grows rapidly with the number of data points. The Vietoris–Rips complex is preferred over the Čech complex because its definition is simpler and the Čech complex requires extra effort to define in a general finite metric space. Efficient ways to lower the computational cost of homology have been studied. For example, the α-complex and witness complex are used to reduce the dimension and size of complexes.
+Recently, Discrete Morse theory has shown promise for computational homology because it can reduce a given simplicial complex to a much smaller cellular complex which is homotopic to the original one. This reduction can in fact be performed as the complex is constructed by using matroid theory, leading to further performance increases. Another recent algorithm saves time by ignoring the homology classes with low persistence.
+Various software packages are available, such as javaPlex, Dionysus, Perseus, PHAT, DIPHA, GUDHI, Ripser, and TDAstats. A comparison between these tools is done by Otter et al. Giotto-tda is a Python package dedicated to integrating TDA in the machine learning workflow by means of a scikit-learn [1] API. An R package TDA is capable of calculating recently invented concepts like landscape and the kernel distance estimator. The Topology ToolKit is specialized for continuous data defined on manifolds of low dimension (1, 2 or 3), as typically found in scientific visualization. Cubicle is optimized for large (gigabyte-scale) grayscale image data in dimension 1, 2 or 3 using  cubical complexes and discrete Morse theory. Another R package, TDAstats, uses the Ripser library to calculate persistent homology.
+
+== Visualization ==
+High-dimensional data is impossible to visualize directly. Many methods have been invented to extract a low-dimensional structure from the data set, such as principal component analysis and multidimensional scaling. However, it is important to note that the problem itself is ill-posed, since many different topological features can be found in the same data set. Thus, the study of visualization of high-dimensional spaces is of central importance to TDA, although it does not necessarily involve the use of persistent homology. However, recent attempts have been made to use persistent homology in data visualization.
+Carlsson et al. have proposed a general method called MAPPER. It inherits the idea of Jean-Pierre Serre that a covering preserves homotopy. A generalized formulation of MAPPER is as follows:
+Let 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ and 
+  
+    
+      
+        Z
+      
+    
+    {\displaystyle Z}
+  
+ be topological spaces and let 
+  
+    
+      
+        f
+        :
+        X
+        →
+        Z
+      
+    
+    {\displaystyle f\colon X\to Z}
+  
+ be a continuous map. Let 
+  
+    
+      
+        
+          U
+        
+        =
+        {
+        
+          U
+          
+            α
+          
+        
+        
+          }
+          
+            α
+            ∈
+            A
+          
+        
+      
+    
+    {\displaystyle \mathbb {U} =\{U_{\alpha }\}_{\alpha \in A}}
+  
+ be a finite open covering of 
+  
+    
+      
+        Z
+      
+    
+    {\displaystyle Z}
+  
+. The output of MAPPER is the nerve of the pullback cover 
+  
+    
+      
+        M
+        (
+        
+          U
+        
+        ,
+        f
+        )
+        :=
+        N
+        (
+        
+          f
+          
+            −
+            1
+          
+        
+        (
+        
+          U
+        
+        )
+        )
+      
+    
+    {\textstyle M(\mathbb {U} ,f):=N(f^{-1}(\mathbb {U} ))}
+  
+, where each preimage is split into its connected components. This is a very general concept, of which the Reeb graph and merge trees are special cases.
+This is not quite the original definition. Carlsson et al. choose 
+  
+    
+      
+        Z
+      
+    
+    {\displaystyle Z}
+  
+ to be 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+ or 
+  
+    
+      
+        
+          
+            R
+          
+          
+            2
+          
+        
+      
+    
+    {\displaystyle \mathbb {R} ^{2}}
+  
+, and cover it with open sets such that at most two intersect. This restriction means that the output is in the form of a complex network. Because the topology of a finite point cloud is trivial, clustering methods (such as single linkage) are used to produce the analogue of connected sets in the preimage 
+  
+    
+      
+        
+          f
+          
+            −
+            1
+          
+        
+        (
+        U
+        )
+      
+    
+    {\displaystyle f^{-1}(U)}
+  
+ when MAPPER is applied to actual data.
+Mathematically speaking, MAPPER is a variation of the Reeb graph.  If the 
+  
+    
+      
+        M
+        (
+        
+          U
+        
+        ,
+        f
+        )
+      
+    
+    {\textstyle M(\mathbb {U} ,f)}
+  
+ is at most one dimensional, then for each 
+  
+    
+      
+        i
+        ≥
+        0
+      
+    
+    {\displaystyle i\geq 0}
+  
+, 
+  
+    
+      
+        
+          H
+          
+            i
+          
+        
+        (
+        X
+        )
+        ≃
+        
+          H
+          
+            0
+          
+        
+        (
+        N
+        (
+        
+          U
+        
+        )
+        ;
+        
+          
+            
+              
+                F
+                ^
+              
+            
+          
+          
+            i
+          
+        
+        )
+        ⊕
+        
+          H
+          
+            1
+          
+        
+        (
+        N
+        (
+        
+          U
+        
+        )
+        ;
+        
+          
+            
+              
+                F
+                ^
+              
+            
+          
+          
+            i
+            −
+            1
+          
+        
+        )
+        .
+      
+    
+    {\displaystyle H_{i}(X)\simeq H_{0}(N(\mathbb {U} );{\hat {F}}_{i})\oplus H_{1}(N(\mathbb {U} );{\hat {F}}_{i-1}).}
+  
+ The added flexibility also has disadvantages. One problem is instability, in that some change of the choice of the cover can lead to major change of the output of the algorithm. Work has been done to overcome this problem.
+Three successful applications of MAPPER can be found in Carlsson et al. A comment on the applications in this paper by J. Curry is that "a common feature of interest in applications is the presence of flares or tendrils".
+A free implementation of MAPPER written by Daniel Müllner and Aravindakshan Babu is available online. MAPPER also forms the basis of Ayasdi's AI platform.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-3.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-3.md
new file mode 100644
index 000000000..f24c244fd
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-3.md
@@ -0,0 +1,616 @@
+---
+title: "Topological data analysis"
+chunk: 4/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+== Multidimensional persistence ==
+Multidimensional persistence is important to TDA. The concept arises in both theory and practice. The first investigation of multidimensional persistence was early in the development of TDA. Carlsson-Zomorodian introduced the theory of multidimensional persistence in and in collaboration with Singh introduced the use of tools from symbolic algebra (Grobner basis methods) to compute MPH modules. Their definition presents multidimensional persistence with n parameters as a 
+  
+    
+      
+        
+          
+            Z
+          
+          
+            n
+          
+        
+      
+    
+    {\displaystyle \mathbb {Z} ^{n}}
+  
+ graded module over a polynomial ring in n variables. Tools from commutative and homological algebra are applied to the study of multidimensional persistence in work of Harrington-Otter-Schenck-Tillman. The first application to appear in the literature is a method for shape comparison, similar to the invention of TDA.
+The definition of an n-dimensional persistence module in 
+  
+    
+      
+        
+          
+            R
+          
+          
+            n
+          
+        
+      
+    
+    {\displaystyle \mathbb {R} ^{n}}
+  
+ is 
+
+vector space 
+  
+    
+      
+        
+          V
+          
+            s
+          
+        
+      
+    
+    {\displaystyle V_{s}}
+  
+ is assigned to each point in 
+  
+    
+      
+        s
+        =
+        (
+        
+          s
+          
+            1
+          
+        
+        ,
+        …
+        ,
+        
+          s
+          
+            n
+          
+        
+        )
+      
+    
+    {\displaystyle s=(s_{1},\ldots ,s_{n})}
+  
+
+map 
+  
+    
+      
+        
+          ρ
+          
+            s
+          
+          
+            t
+          
+        
+        :
+        
+          V
+          
+            s
+          
+        
+        →
+        
+          V
+          
+            t
+          
+        
+      
+    
+    {\displaystyle \rho _{s}^{t}\colon V_{s}\to V_{t}}
+  
+ is assigned if 
+  
+    
+      
+        s
+        ≤
+        t
+      
+    
+    {\displaystyle s\leq t}
+  
+(
+  
+    
+      
+        
+          s
+          
+            i
+          
+        
+        ≤
+        
+          t
+          
+            i
+          
+        
+        ,
+        i
+        =
+        1
+        ,
+        …
+        ,
+        n
+        )
+      
+    
+    {\displaystyle s_{i}\leq t_{i},i=1,\ldots ,n)}
+  
+
+maps satisfy 
+  
+    
+      
+        
+          ρ
+          
+            r
+          
+          
+            t
+          
+        
+        =
+        
+          ρ
+          
+            s
+          
+          
+            t
+          
+        
+        ∘
+        
+          ρ
+          
+            r
+          
+          
+            s
+          
+        
+      
+    
+    {\displaystyle \rho _{r}^{t}=\rho _{s}^{t}\circ \rho _{r}^{s}}
+  
+ for all 
+  
+    
+      
+        r
+        ≤
+        s
+        ≤
+        t
+      
+    
+    {\displaystyle r\leq s\leq t}
+  
+
+It might be worth noting that there are controversies on the definition of multidimensional persistence.
+One of the advantages of one-dimensional persistence is its representability by a diagram or barcode. However, discrete complete invariants of multidimensional persistence modules do not exist. The main reason for this is that the structure of the collection of indecomposables is extremely complicated by Gabriel's theorem in the theory of quiver representations, although a finitely generated n-dim persistence module can be uniquely decomposed into a direct sum of indecomposables due to the Krull-Schmidt theorem.
+Nonetheless, many results have been established. Carlsson and Zomorodian introduced the rank invariant 
+  
+    
+      
+        
+          ρ
+          
+            M
+          
+        
+        (
+        u
+        ,
+        v
+        )
+      
+    
+    {\displaystyle \rho _{M}(u,v)}
+  
+, defined as the 
+  
+    
+      
+        
+          ρ
+          
+            M
+          
+        
+        (
+        u
+        ,
+        v
+        )
+        =
+        
+          r
+          a
+          n
+          k
+        
+        (
+        
+          x
+          
+            u
+            −
+            v
+          
+        
+        :
+        
+          M
+          
+            u
+          
+        
+        →
+        
+          M
+          
+            v
+          
+        
+        )
+      
+    
+    {\displaystyle \rho _{M}(u,v)=\mathrm {rank} (x^{u-v}\colon M_{u}\to M_{v})}
+  
+, in which 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ is a finitely generated n-graded module. In one dimension, it is equivalent to the barcode. In the literature, the rank invariant is often referred as the persistent Betti numbers (PBNs). In many theoretical works, authors have used a more restricted definition, an analogue from sublevel set persistence. Specifically,  the persistence Betti numbers of a function 
+  
+    
+      
+        f
+        :
+        X
+        →
+        
+          
+            R
+          
+          
+            k
+          
+        
+      
+    
+    {\displaystyle f:X\to \mathbb {R} ^{k}}
+  
+ are given by the function 
+  
+    
+      
+        
+          β
+          
+            f
+          
+        
+        :
+        
+          Δ
+          
+            +
+          
+        
+        →
+        
+          N
+        
+      
+    
+    {\displaystyle \beta _{f}\colon \Delta ^{+}\to \mathrm {N} }
+  
+, taking each 
+  
+    
+      
+        (
+        u
+        ,
+        v
+        )
+        ∈
+        
+          Δ
+          
+            +
+          
+        
+      
+    
+    {\displaystyle (u,v)\in \Delta ^{+}}
+  
+ to 
+  
+    
+      
+        
+          β
+          
+            f
+          
+        
+        (
+        u
+        ,
+        v
+        )
+        :=
+        
+          r
+          a
+          n
+          k
+        
+        (
+        H
+        (
+        X
+        (
+        f
+        ≤
+        u
+        )
+        →
+        H
+        (
+        X
+        (
+        f
+        ≤
+        v
+        )
+        )
+        )
+      
+    
+    {\displaystyle \beta _{f}(u,v):=\mathrm {rank} (H(X(f\leq u)\to H(X(f\leq v)))}
+  
+, where 
+  
+    
+      
+        
+          Δ
+          
+            +
+          
+        
+        :=
+        {
+        (
+        u
+        ,
+        v
+        )
+        ∈
+        
+          
+            R
+          
+          
+            k
+          
+        
+        ×
+        
+          
+            R
+          
+          
+            k
+          
+        
+        :
+        u
+        ≤
+        v
+        }
+      
+    
+    {\displaystyle \Delta ^{+}:=\{(u,v)\in \mathbb {R} ^{k}\times \mathbb {R} ^{k}:u\leq v\}}
+  
+ and 
+  
+    
+      
+        X
+        (
+        f
+        ≤
+        u
+        )
+        :=
+        {
+        x
+        ∈
+        X
+        :
+        f
+        (
+        x
+        )
+        ≤
+        u
+        }
+      
+    
+    {\displaystyle X(f\leq u):=\{x\in X:f(x)\leq u\}}
+  
+.
+Some basic properties include monotonicity and diagonal jump. Persistent Betti numbers will be finite if 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ is a compact and locally contractible subspace of 
+  
+    
+      
+        
+          
+            R
+          
+          
+            n
+          
+        
+      
+    
+    {\displaystyle \mathbb {R} ^{n}}
+  
+.
+Using a foliation method, the k-dim PBNs can be decomposed into a family of 1-dim PBNs by dimensionality deduction. This method has also led to a proof that multi-dim PBNs are stable. The discontinuities of PBNs only occur at points 
+  
+    
+      
+        (
+        u
+        ,
+        v
+        )
+        (
+        u
+        ≤
+        v
+        )
+      
+    
+    {\displaystyle (u,v)(u\leq v)}
+  
+ where either 
+  
+    
+      
+        u
+      
+    
+    {\displaystyle u}
+  
+ is a discontinuous point of 
+  
+    
+      
+        
+          ρ
+          
+            M
+          
+        
+        (
+        ⋆
+        ,
+        v
+        )
+      
+    
+    {\displaystyle \rho _{M}(\star ,v)}
+  
+ or  
+  
+    
+      
+        v
+      
+    
+    {\displaystyle v}
+  
+ is a discontinuous point of 
+  
+    
+      
+        ρ
+        (
+        u
+        ,
+        ⋆
+        )
+      
+    
+    {\displaystyle \rho (u,\star )}
+  
+ under the assumption that 
+  
+    
+      
+        f
+        ∈
+        
+          C
+          
+            0
+          
+        
+        (
+        X
+        ,
+        
+          
+            R
+          
+          
+            k
+          
+        
+        )
+      
+    
+    {\displaystyle f\in C^{0}(X,\mathbb {R} ^{k})}
+  
+ and 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ is a compact, triangulable topological space.
+Persistent space, a generalization of persistent diagram, is defined as the multiset of all points with multiplicity larger than 0 and the diagonal. It provides a stable and complete representation of PBNs. An ongoing work by Carlsson et al. is trying to give geometric interpretation of persistent homology, which might provide insights on how to combine machine learning theory with topological data analysis.
+The first practical algorithm to compute multidimensional persistence was invented very early. After then, many other algorithms have been proposed, based on such concepts as discrete morse theory and finite sample estimating.
+
+== Other persistences ==
+The standard paradigm in TDA is often referred as sublevel persistence. Apart from multidimensional persistence, many works have been done to extend this special case.
+
+=== Zigzag persistence ===
+The nonzero maps in persistence module are restricted by the preorder relationship in the category. However, mathematicians have found that the unanimity of direction is not essential to many results. "The philosophical point is that the decomposition theory of graph representations is somewhat independent of the orientation of the graph edges". Zigzag persistence is important to the theoretical side. The examples given in Carlsson's review paper to illustrate the importance of functorality all share some of its features.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-4.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-4.md
new file mode 100644
index 000000000..ad9286d5c
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-4.md
@@ -0,0 +1,164 @@
+---
+title: "Topological data analysis"
+chunk: 5/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+=== Extended persistence and levelset persistence ===
+There are some attempts to loosen the stricter restriction of the function. Please refer to the Categorification and cosheaves and Impact on mathematics sections for more information.
+It's natural to extend persistence homology to other basic concepts in algebraic topology, such as cohomology and relative homology/cohomology. An interesting application is the computation of circular coordinates for a data set via the first persistent cohomology group.
+
+=== Circular persistence ===
+Normal persistence homology studies real-valued functions. The circle-valued map might be useful, "persistence theory for circle-valued maps promises to play the role for some vector fields as does the standard persistence theory for scalar fields", as commented in Dan Burghelea et al. The main difference is that Jordan cells (very similar in format to the Jordan blocks in linear algebra) are nontrivial in circle-valued functions, which would be zero in real-valued case, and combining with barcodes give the invariants of a tame map, under moderate conditions.
+Two techniques they use are Morse-Novikov theory and graph representation theory. More recent results can be found in D. Burghelea et al. For example, the tameness requirement can be replaced by the much weaker condition, continuous.
+
+=== Persistence with torsion ===
+The proof of the structure theorem relies on the base domain being field, so not many attempts have been made on persistence homology with torsion. Frosini defined a pseudometric on this specific module and proved its stability. One of its novelty is that it doesn't depend on some classification theory to define the metric.
+
+== Categorification and cosheaves ==
+One advantage of category theory is its ability to lift concrete results to a higher level, showing relationships between seemingly unconnected objects. Peter Bubenik et al. offers a short introduction of category theory fitted for TDA.
+Category theory is the language of modern algebra, and has been widely used in the study of algebraic geometry and topology. It has been noted that "the key observation of is that the persistence diagram produced by depends only on the algebraic structure carried by this diagram." The use of category theory in TDA has proved to be fruitful.
+Following the notations made in Bubenik et al., the indexing category 
+  
+    
+      
+        P
+      
+    
+    {\textstyle P}
+  
+ is any preordered set (not necessarily 
+  
+    
+      
+        
+          N
+        
+      
+    
+    {\displaystyle \mathbb {N} }
+  
+ or 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+),  the target category 
+  
+    
+      
+        D
+      
+    
+    {\displaystyle D}
+  
+ is any category (instead of the commonly used 
+  
+    
+      
+        
+          
+            V
+            e
+            c
+            t
+          
+          
+            
+              F
+            
+          
+        
+      
+    
+    {\textstyle \mathrm {Vect} _{\mathbb {F} }}
+  
+), and functors 
+  
+    
+      
+        P
+        →
+        D
+      
+    
+    {\textstyle P\to D}
+  
+ are called generalized persistence modules in 
+  
+    
+      
+        D
+      
+    
+    {\displaystyle D}
+  
+, over 
+  
+    
+      
+        P
+      
+    
+    {\textstyle P}
+  
+.
+One advantage of using category theory in TDA is a clearer understanding of concepts and the discovery of new relationships between proofs. Take two examples for illustration. The understanding of the correspondence between interleaving and matching is of huge importance, since matching has been the method used in the beginning (modified from Morse theory).  A summary of works can be found in Vin de Silva et al. Many theorems can be proved much more easily in a more intuitive setting. Another example is the relationship between the construction of different complexes from point clouds. It has long been noticed that Čech and Vietoris-Rips complexes are related. Specifically, 
+  
+    
+      
+        
+          V
+          
+            r
+          
+        
+        (
+        X
+        )
+        ⊂
+        
+          C
+          
+            
+              
+                2
+              
+            
+            r
+          
+        
+        (
+        X
+        )
+        ⊂
+        
+          V
+          
+            2
+            r
+          
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle V_{r}(X)\subset C_{{\sqrt {2}}r}(X)\subset V_{2r}(X)}
+  
+. The essential relationship between Cech and Rips complexes can be seen much more clearly in categorical language.
+The language of category theory also helps cast results in terms recognizable to the broader mathematical community.  Bottleneck distance is widely used in TDA because of the results on stability with respect to the bottleneck distance. In fact, the interleaving distance is the terminal object in a poset category of stable metrics on multidimensional persistence modules in a prime field.
+Sheaves, a central concept in modern algebraic geometry, are intrinsically related to category theory. Roughly speaking, sheaves are the mathematical tool for understanding how local information determines global information. Justin Curry regards level set persistence as the study of fibers of continuous functions. The objects that he studies are very similar to those by MAPPER, but with sheaf theory as the theoretical foundation. Although no breakthrough in the theory of TDA has yet used sheaf theory, it is promising since there are many beautiful theorems in algebraic geometry relating to sheaf theory. For example, a natural theoretical question is whether different filtration methods result in the same output.
+
+== Stability ==
+Stability is of central importance to data analysis, since real data carry noises. By usage of category theory, Bubenik et al. have distinguished between soft and hard stability theorems, and proved that soft cases are formal. Specifically, general workflow of TDA is
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-5.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-5.md
new file mode 100644
index 000000000..d6d29a708
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-5.md
@@ -0,0 +1,418 @@
+---
+title: "Topological data analysis"
+chunk: 6/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+The soft stability theorem asserts that 
+  
+    
+      
+        H
+        F
+      
+    
+    {\displaystyle HF}
+  
+ is Lipschitz continuous, and the hard stability theorem asserts that 
+  
+    
+      
+        J
+      
+    
+    {\displaystyle J}
+  
+ is Lipschitz continuous.
+Bottleneck distance is widely used in TDA. The isometry theorem asserts that the interleaving distance 
+  
+    
+      
+        
+          d
+          
+            I
+          
+        
+      
+    
+    {\displaystyle d_{I}}
+  
+ is equal to the bottleneck distance. Bubenik et al. have abstracted the definition to that between functors 
+  
+    
+      
+        F
+        ,
+        G
+        :
+        P
+        →
+        D
+      
+    
+    {\displaystyle F,G\colon P\to D}
+  
+  when 
+  
+    
+      
+        P
+      
+    
+    {\textstyle P}
+  
+ is equipped with a sublinear projection or superlinear family, in which still remains a pseudometric. Considering the magnificent characters of interleaving distance, here we introduce the general definition of interleaving distance(instead of the first introduced one): Let 
+  
+    
+      
+        Γ
+        ,
+        K
+        ∈
+        
+          T
+          r
+          a
+          n
+          
+            s
+            
+              P
+            
+          
+        
+      
+    
+    {\displaystyle \Gamma ,K\in \mathrm {Trans_{P}} }
+  
+ (a function from 
+  
+    
+      
+        P
+      
+    
+    {\textstyle P}
+  
+ to 
+  
+    
+      
+        P
+      
+    
+    {\textstyle P}
+  
+ which is monotone and satisfies 
+  
+    
+      
+        x
+        ≤
+        Γ
+        (
+        x
+        )
+      
+    
+    {\displaystyle x\leq \Gamma (x)}
+  
+  for all 
+  
+    
+      
+        x
+        ∈
+        P
+      
+    
+    {\textstyle x\in P}
+  
+). A 
+  
+    
+      
+        (
+        Γ
+        ,
+        K
+        )
+      
+    
+    {\displaystyle (\Gamma ,K)}
+  
+-interleaving between F and G consists of natural transformations 
+  
+    
+      
+        φ
+        :
+        F
+        ⇒
+        G
+        Γ
+      
+    
+    {\displaystyle \varphi \colon F\Rightarrow G\Gamma }
+  
+ and 
+  
+    
+      
+        ψ
+        :
+        G
+        ⇒
+        F
+        K
+      
+    
+    {\displaystyle \psi \colon G\Rightarrow FK}
+  
+, such that 
+  
+    
+      
+        (
+        ψ
+        Γ
+        )
+        =
+        φ
+        F
+        
+          η
+          
+            K
+            Γ
+          
+        
+      
+    
+    {\displaystyle (\psi \Gamma )=\varphi F\eta _{K\Gamma }}
+  
+ and 
+  
+    
+      
+        (
+        φ
+        Γ
+        )
+        =
+        ψ
+        G
+        
+          η
+          
+            Γ
+            K
+          
+        
+      
+    
+    {\displaystyle (\varphi \Gamma )=\psi G\eta _{\Gamma K}}
+  
+.
+The two main results are
+
+Let 
+  
+    
+      
+        P
+      
+    
+    {\textstyle P}
+  
+ be a preordered set with a sublinear projection or superlinear family. Let 
+  
+    
+      
+        H
+        :
+        D
+        →
+        E
+      
+    
+    {\textstyle H:D\to E}
+  
+ be a functor between arbitrary categories 
+  
+    
+      
+        D
+        ,
+        E
+      
+    
+    {\textstyle D,E}
+  
+. Then for any two functors 
+  
+    
+      
+        F
+        ,
+        G
+        :
+        P
+        →
+        D
+      
+    
+    {\textstyle F,G\colon P\to D}
+  
+, we have 
+  
+    
+      
+        
+          d
+          
+            I
+          
+        
+        (
+        H
+        F
+        ,
+        H
+        G
+        )
+        ≤
+        
+          d
+          
+            I
+          
+        
+        (
+        F
+        ,
+        G
+        )
+      
+    
+    {\textstyle d_{I}(HF,HG)\leq d_{I}(F,G)}
+  
+.
+Let 
+  
+    
+      
+        P
+      
+    
+    {\textstyle P}
+  
+ be a poset of a metric space 
+  
+    
+      
+        Y
+      
+    
+    {\textstyle Y}
+  
+, 
+  
+    
+      
+        X
+      
+    
+    {\textstyle X}
+  
+ be a topological space. And let
+  
+    
+      
+        f
+        ,
+        g
+        :
+        X
+        →
+        Y
+      
+    
+    {\textstyle f,g\colon X\to Y}
+  
+ (not necessarily continuous) be functions, and 
+  
+    
+      
+        F
+        ,
+        G
+      
+    
+    {\textstyle F,G}
+  
+ to be the corresponding persistence diagram. Then 
+  
+    
+      
+        
+          d
+          
+            I
+          
+        
+        (
+        F
+        ,
+        G
+        )
+        ≤
+        
+          d
+          
+            ∞
+          
+        
+        (
+        f
+        ,
+        g
+        )
+        :=
+        
+          sup
+          
+            x
+            ∈
+            X
+          
+        
+        
+          d
+          
+            Y
+          
+        
+        (
+        f
+        (
+        x
+        )
+        ,
+        g
+        (
+        x
+        )
+        )
+      
+    
+    {\displaystyle d_{I}(F,G)\leq d_{\infty }(f,g):=\sup _{x\in X}d_{Y}(f(x),g(x))}
+  
+.
+These two results summarize many results on stability of different models of persistence.
+For the stability theorem of multidimensional persistence, please refer to the subsection of persistence.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-6.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-6.md
new file mode 100644
index 000000000..aedc631c3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-6.md
@@ -0,0 +1,367 @@
+---
+title: "Topological data analysis"
+chunk: 7/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+== Structure theorem ==
+The structure theorem is of central importance to TDA; as commented by G. Carlsson, "what makes homology useful as a discriminator between topological spaces is the fact that there is a classification theorem for finitely generated abelian groups". (see the fundamental theorem of finitely generated abelian groups).
+The main argument used in the proof of the original structure theorem is the standard structure theorem for finitely generated modules over a principal ideal domain. However, this argument fails if the indexing set is 
+  
+    
+      
+        (
+        
+          R
+        
+        ,
+        ≤
+        )
+      
+    
+    {\displaystyle (\mathbb {R} ,\leq )}
+  
+.
+In general, not every persistence module can be decomposed into intervals. Many attempts have been made at relaxing the restrictions of the original structure theorem. The case for pointwise finite-dimensional persistence modules indexed by a locally finite subset of 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+ is solved based on the work of Webb. The most notable result is done by Crawley-Boevey, which solved the case of 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+. Crawley-Boevey's theorem states that any pointwise finite-dimensional persistence module is a direct sum of interval modules.
+To understand the definition of his theorem, some concepts need introducing. An interval in 
+  
+    
+      
+        (
+        
+          R
+        
+        ,
+        ≤
+        )
+      
+    
+    {\displaystyle (\mathbb {R} ,\leq )}
+  
+ is defined as a subset 
+  
+    
+      
+        I
+        ⊂
+        
+          R
+        
+      
+    
+    {\displaystyle I\subset \mathbb {R} }
+  
+ having the property that if 
+  
+    
+      
+        r
+        ,
+        t
+        ∈
+        I
+      
+    
+    {\displaystyle r,t\in I}
+  
+ and if there is an 
+  
+    
+      
+        s
+        ∈
+        
+          R
+        
+      
+    
+    {\displaystyle s\in \mathbb {R} }
+  
+ such that 
+  
+    
+      
+        r
+        ≤
+        s
+        ≤
+        t
+      
+    
+    {\displaystyle r\leq s\leq t}
+  
+, then 
+  
+    
+      
+        s
+        ∈
+        I
+      
+    
+    {\displaystyle s\in I}
+  
+ as well. An interval module 
+  
+    
+      
+        
+          k
+          
+            I
+          
+        
+      
+    
+    {\displaystyle k_{I}}
+  
+ assigns to each element 
+  
+    
+      
+        s
+        ∈
+        I
+      
+    
+    {\displaystyle s\in I}
+  
+ the vector space 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+ and assigns the zero vector space to elements in 
+  
+    
+      
+        
+          R
+        
+        ∖
+        I
+      
+    
+    {\displaystyle \mathbb {R} \setminus I}
+  
+. All maps 
+  
+    
+      
+        
+          ρ
+          
+            s
+          
+          
+            t
+          
+        
+      
+    
+    {\displaystyle \rho _{s}^{t}}
+  
+ are the zero map, unless 
+  
+    
+      
+        s
+        ,
+        t
+        ∈
+        I
+      
+    
+    {\displaystyle s,t\in I}
+  
+ and 
+  
+    
+      
+        s
+        ≤
+        t
+      
+    
+    {\displaystyle s\leq t}
+  
+, in which case 
+  
+    
+      
+        
+          ρ
+          
+            s
+          
+          
+            t
+          
+        
+      
+    
+    {\displaystyle \rho _{s}^{t}}
+  
+ is the identity map. Interval modules are indecomposable.
+Although the result of Crawley-Boevey is a very powerful theorem, it still doesn't extend to the q-tame case. A persistence module is q-tame if the rank of 
+  
+    
+      
+        
+          ρ
+          
+            s
+          
+          
+            t
+          
+        
+      
+    
+    {\displaystyle \rho _{s}^{t}}
+  
+ is finite for all 
+  
+    
+      
+        s
+        <
+        t
+      
+    
+    {\displaystyle s<t}
+  
+. There are examples of q-tame persistence modules that fail to be pointwise finite. However, it turns out that a similar structure theorem still holds if the features that exist only at one index value are removed. This holds because the infinite dimensional parts at each index value do not persist, due to the finite-rank condition. Formally, the observable category 
+  
+    
+      
+        
+          O
+          b
+        
+      
+    
+    {\displaystyle \mathrm {Ob} }
+  
+ is defined as 
+  
+    
+      
+        
+          P
+          e
+          r
+          s
+        
+        
+          /
+        
+        
+          E
+          p
+          h
+        
+      
+    
+    {\displaystyle \mathrm {Pers} /\mathrm {Eph} }
+  
+, in which 
+  
+    
+      
+        
+          E
+          p
+          h
+        
+      
+    
+    {\displaystyle \mathrm {Eph} }
+  
+ denotes the full subcategory of 
+  
+    
+      
+        
+          P
+          e
+          r
+          s
+        
+      
+    
+    {\displaystyle \mathrm {Pers} }
+  
+ whose objects are the ephemeral modules (
+  
+    
+      
+        
+          ρ
+          
+            s
+          
+          
+            t
+          
+        
+        =
+        0
+      
+    
+    {\displaystyle \rho _{s}^{t}=0}
+  
+ whenever 
+  
+    
+      
+        s
+        <
+        t
+      
+    
+    {\displaystyle s<t}
+  
+).
+Note that the extended results listed here do not apply to zigzag persistence, since the analogue of a zigzag persistence module over 
+  
+    
+      
+        
+          R
+        
+      
+    
+    {\displaystyle \mathbb {R} }
+  
+ is not immediately obvious.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-7.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-7.md
new file mode 100644
index 000000000..4763311f5
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-7.md
@@ -0,0 +1,580 @@
+---
+title: "Topological data analysis"
+chunk: 8/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+== Statistics ==
+Real data is always finite, and so its study requires us to take stochasticity into account. Statistical analysis gives us the ability to separate true features of the data from artifacts introduced by random noise. Persistent homology has no inherent mechanism to distinguish between low-probability features and high-probability features.
+One way to apply statistics to topological data analysis is to study the statistical properties of topological features of point clouds. The study of random simplicial complexes offers some insight into statistical topology. Katharine Turner et al. offers a summary of work in this vein.
+A second way is to study probability distributions on the persistence space. The persistence space 
+  
+    
+      
+        
+          B
+          
+            ∞
+          
+        
+      
+    
+    {\displaystyle B_{\infty }}
+  
+ is 
+  
+    
+      
+        
+          ∐
+          
+            n
+          
+        
+        
+          B
+          
+            n
+          
+        
+        
+          /
+        
+        
+          ∽
+        
+      
+    
+    {\displaystyle \coprod _{n}B_{n}/{\backsim }}
+  
+, where 
+  
+    
+      
+        
+          B
+          
+            n
+          
+        
+      
+    
+    {\displaystyle B_{n}}
+  
+ is the space of all barcodes containing exactly 
+  
+    
+      
+        n
+      
+    
+    {\displaystyle n}
+  
+ intervals and the equivalences are 
+  
+    
+      
+        {
+        [
+        
+          x
+          
+            1
+          
+        
+        ,
+        
+          y
+          
+            1
+          
+        
+        ]
+        ,
+        [
+        
+          x
+          
+            2
+          
+        
+        ,
+        
+          y
+          
+            2
+          
+        
+        ]
+        ,
+        …
+        ,
+        [
+        
+          x
+          
+            n
+          
+        
+        ,
+        
+          y
+          
+            n
+          
+        
+        ]
+        }
+        ∽
+        {
+        [
+        
+          x
+          
+            1
+          
+        
+        ,
+        
+          y
+          
+            1
+          
+        
+        ]
+        ,
+        [
+        
+          x
+          
+            2
+          
+        
+        ,
+        
+          y
+          
+            2
+          
+        
+        ]
+        ,
+        …
+        ,
+        [
+        
+          x
+          
+            n
+            −
+            1
+          
+        
+        ,
+        
+          y
+          
+            n
+            −
+            1
+          
+        
+        ]
+        }
+      
+    
+    {\displaystyle \{[x_{1},y_{1}],[x_{2},y_{2}],\ldots ,[x_{n},y_{n}]\}\backsim \{[x_{1},y_{1}],[x_{2},y_{2}],\ldots ,[x_{n-1},y_{n-1}]\}}
+  
+ if 
+  
+    
+      
+        
+          x
+          
+            n
+          
+        
+        =
+        
+          y
+          
+            n
+          
+        
+      
+    
+    {\displaystyle x_{n}=y_{n}}
+  
+. This space is fairly complicated; for example, it is not complete under the bottleneck metric. The first attempt made to study it is by Yuriy Mileyko et al. The space of persistence diagrams 
+  
+    
+      
+        
+          D
+          
+            p
+          
+        
+      
+    
+    {\displaystyle D_{p}}
+  
+ in their paper is defined as 
+  
+    
+      
+        
+          D
+          
+            p
+          
+        
+        :=
+        
+          {
+          
+            d
+            ∣
+            
+              ∑
+              
+                x
+                ∈
+                d
+              
+            
+            
+              
+                (
+                
+                  2
+                  
+                    inf
+                    
+                      y
+                      ∈
+                      Δ
+                    
+                  
+                  ‖
+                  x
+                  −
+                  y
+                  ‖
+                
+                )
+              
+              
+                p
+              
+            
+            <
+            ∞
+          
+          }
+        
+      
+    
+    {\displaystyle D_{p}:=\left\{d\mid \sum _{x\in d}\left(2\inf _{y\in \Delta }\lVert x-y\rVert \right)^{p}<\infty \right\}}
+  
+where 
+  
+    
+      
+        Δ
+      
+    
+    {\displaystyle \Delta }
+  
+ is the diagonal line in 
+  
+    
+      
+        
+          
+            R
+          
+          
+            2
+          
+        
+      
+    
+    {\displaystyle \mathbb {R} ^{2}}
+  
+. A nice property is that 
+  
+    
+      
+        
+          D
+          
+            p
+          
+        
+      
+    
+    {\displaystyle D_{p}}
+  
+ is complete and separable in the Wasserstein metric 
+  
+    
+      
+        
+          W
+          
+            p
+          
+        
+        (
+        u
+        ,
+        v
+        )
+        =
+        
+          
+            (
+            
+              
+                inf
+                
+                  γ
+                  ∈
+                  Γ
+                  (
+                  u
+                  ,
+                  v
+                  )
+                
+              
+              
+                ∫
+                
+                  
+                    X
+                  
+                  ×
+                  
+                    X
+                  
+                
+              
+              
+                ρ
+                
+                  p
+                
+              
+              (
+              x
+              ,
+              y
+              )
+              
+              
+                d
+              
+              γ
+              (
+              x
+              ,
+              y
+              )
+            
+            )
+          
+          
+            1
+            
+              /
+            
+            p
+          
+        
+      
+    
+    {\displaystyle W_{p}(u,v)=\left(\inf _{\gamma \in \Gamma (u,v)}\int _{\mathbb {X} \times \mathbb {X} }\rho ^{p}(x,y)\,\mathrm {d} \gamma (x,y)\right)^{1/p}}
+  
+. Expectation, variance, and conditional probability can be defined in the Fréchet sense. This allows many statistical tools to be ported to TDA. Works on null hypothesis significance test, confidence intervals, and robust estimates are notable steps.
+A third way is to consider the cohomology of probabilistic space or statistical systems directly, called information structures and basically consisting in the triple (
+  
+    
+      
+        Ω
+        ,
+        Π
+        ,
+        P
+      
+    
+    {\displaystyle \Omega ,\Pi ,P}
+  
+), sample space, random variables and probability laws. Random variables are considered as partitions of the n atomic probabilities (seen as a probability (n-1)-simplex, 
+  
+    
+      
+        
+          |
+        
+        Ω
+        
+          |
+        
+        =
+        n
+      
+    
+    {\displaystyle |\Omega |=n}
+  
+) on the lattice of partitions (
+  
+    
+      
+        
+          Π
+          
+            n
+          
+        
+      
+    
+    {\displaystyle \Pi _{n}}
+  
+). The random variables or modules of measurable functions provide the cochain complexes while the coboundary is considered as the general homological algebra first discovered by Gerhard Hochschild with a left action implementing the action of conditioning. The first cocycle condition corresponds to the chain rule of entropy, allowing to derive uniquely up to the multiplicative constant, Shannon entropy as the first cohomology class. The consideration of a deformed left-action generalises the framework to Tsallis entropies. The information cohomology is an example of ringed topos. Multivariate k-Mutual information appear in coboundaries expressions, and their vanishing, related to cocycle condition, gives equivalent conditions for statistical independence. Minima of mutual-informations, also called synergy, give rise to interesting independence configurations analog to homotopical links. Because of its combinatorial complexity, only the simplicial subcase of the cohomology and of information structure has been investigated on data. Applied to data, those cohomological tools quantifies statistical dependences and independences, including Markov chains and conditional independence, in the multivariate case. Notably, mutual-informations generalize correlation coefficient and covariance to non-linear statistical dependences. These approaches were developed independently and only indirectly related to persistence methods, but may be roughly understood in the simplicial case using Hu Kuo Tin Theorem that establishes one-to-one correspondence between mutual-informations functions and finite measurable function of a set with intersection operator, to construct the Čech complex skeleton. Information cohomology offers some direct interpretation and application in terms of neuroscience (neural assembly theory and qualitative cognition), statistical physic, and deep neural network for which the structure and learning algorithm are imposed by the complex of random variables and the information chain rule.
+Persistence landscapes, introduced by Peter Bubenik, are a different way to represent barcodes, more amenable to statistical analysis. The persistence landscape of a persistent module 
+  
+    
+      
+        M
+      
+    
+    {\displaystyle M}
+  
+ is defined as a function 
+  
+    
+      
+        λ
+        :
+        
+          N
+        
+        ×
+        
+          R
+        
+        →
+        
+          
+            
+              
+                R
+              
+              ¯
+            
+          
+        
+      
+    
+    {\displaystyle \lambda :\mathbb {N} \times \mathbb {R} \to {\bar {\mathbb {R} }}}
+  
+, 
+  
+    
+      
+        λ
+        (
+        k
+        ,
+        t
+        )
+        :=
+        sup
+        (
+        m
+        ≥
+        0
+        ∣
+        
+          β
+          
+            t
+            −
+            m
+            ,
+            t
+            −
+            m
+          
+        
+        ≥
+        k
+        )
+      
+    
+    {\displaystyle \lambda (k,t):=\sup(m\geq 0\mid \beta ^{t-m,t-m}\geq k)}
+  
+, where 
+  
+    
+      
+        
+          
+            
+              
+                R
+              
+              ¯
+            
+          
+        
+      
+    
+    {\displaystyle {\bar {\mathbb {R} }}}
+  
+ denotes the extended real line and 
+  
+    
+      
+        
+          β
+          
+            a
+            ,
+            b
+          
+        
+        =
+        
+          d
+          i
+          m
+        
+        (
+        
+          i
+          m
+        
+        (
+        M
+        (
+        a
+        ≤
+        b
+        )
+        )
+        )
+      
+    
+    {\displaystyle \beta ^{a,b}=\mathrm {dim} (\mathrm {im} (M(a\leq b)))}
+  
+. The space of persistence landscapes is very nice: it inherits all good properties of barcode representation (stability, easy representation, etc.), but statistical quantities can be readily defined, and some problems in Y. Mileyko et al.'s work, such as the non-uniqueness of expectations, can be overcome. Effective algorithms for computation with persistence landscapes are available. Another approach is to use revised persistence, which is image, kernel and cokernel persistence.
+
+== Applications ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Topological_data_analysis-8.md b/data/en.wikipedia.org/wiki/Topological_data_analysis-8.md
new file mode 100644
index 000000000..c23d934ea
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Topological_data_analysis-8.md
@@ -0,0 +1,85 @@
+---
+title: "Topological data analysis"
+chunk: 9/9
+source: "https://en.wikipedia.org/wiki/Topological_data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:17.950679+00:00"
+instance: "kb-cron"
+---
+
+=== Classification of applications ===
+More than one way exists to classify the applications of TDA. Perhaps the most natural way is by field. A very incomplete list of successful applications includes data skeletonization, shape study, graph reconstruction,
+
+image analysis,
+ material, progression analysis of disease, sensor network, signal analysis, cosmic web, complex network, fractal geometry, viral evolution, propagation of contagions on networks, bacteria classification using molecular spectroscopy, super-resolution microscopy, hyperspectral imaging in physical-chemistry, remote sensing, feature selection, and early warning signs of financial crashes.
+
+Another way is by distinguishing the techniques by G. Carlsson,one being the study of homological invariants of data on individual data sets, and the other is the use of homological invariants in the study of databases where the data points themselves have geometric structure.
+
+=== Impact on mathematics ===
+Topological data analysis and persistent homology have had impacts on Morse theory. Morse theory has played a very important role in the theory of TDA, including on computation. Some work in persistent homology has extended results about Morse functions to tame functions or, even to continuous functions. A forgotten result of R. Deheuvels long before the invention of persistent homology extends Morse theory to all continuous functions.
+One recent result is that the category of Reeb graphs is equivalent to a particular class of cosheaf. This is motivated by theoretical work in TDA, since the Reeb graph is related to Morse theory and MAPPER is derived from it. The proof of this theorem relies on the interleaving distance.
+Persistent homology is closely related to spectral sequences. In particular the algorithm bringing a filtered complex to its canonical form permits much faster calculation of spectral sequences than the standard procedure of calculating 
+  
+    
+      
+        
+          E
+          
+            p
+            ,
+            q
+          
+          
+            r
+          
+        
+      
+    
+    {\displaystyle E_{p,q}^{r}}
+  
+ groups page by page. Zigzag persistence may turn out to be of theoretical importance to spectral sequences.
+
+=== DONUT: A Database of TDA Applications ===
+The Database of Original & Non-Theoretical Uses of Topology (DONUT) is a database of scholarly articles featuring practical applications of topological data analysis to various areas of science. DONUT was started in 2017 by Barbara Giunti, Janis Lazovskis, and Bastian Rieck, and as of October 2023 currently contains 447 articles. DONUT was featured in the November 2023 issue of the Notices of the American Mathematical Society.
+
+=== Applications to Adversarial ML ===
+The stability property of topological features to small perturbations has been applied to make Graph Neural Networks robust against adversaries. Arafat et. al. proposed a robustness framework which systematically integrates both local and global topological graph feature representations, the impact of which is controlled by the robust regularized topological loss. Given the attacker's budget, they derived stability guarantees on the node representations, establishing an important connection between Topological stability and Adversarial ML.
+
+== See also ==
+Dimensionality reduction
+Data mining
+Computer vision
+Computational topology
+Discrete Morse theory
+Shape analysis (digital geometry)
+Size theory
+Algebraic topology
+Topological deep learning
+
+== References ==
+
+== Further reading ==
+
+=== Brief Introductions ===
+Lesnick, Michael (2013). "Studying the Shape of Data Using Topology". Institute for Advanced Study.
+Source Material for Topological Data Analysis by Mikael Vejdemo-Johansson
+
+=== Monograph ===
+Oudot, Steve Y. (2015). Persistence Theory: From Quiver Representations to Data Analysis. American Mathematical Society. ISBN 978-1-4704-2545-6.
+
+=== Textbooks on Topology ===
+Hatcher, Allen (2002). Algebraic Topology. Cambridge University Press. ISBN 0-521-79540-0. Available for Download
+Edelsbrunner, Herbert; Harer, John (2010). Computational Topology: An Introduction. American Mathematical Society. ISBN 978-0-8218-4925-5.
+Elementary Applied Topology, by Robert Ghrist
+
+== External links ==
+Database of Original & Non-Theoretical Uses of Topology (DONUT)
+
+=== Video Lectures ===
+Introduction to Persistent Homology and Topology for Data Analysis, by Matthew Wright
+The Shape of Data, by Gunnar Carlsson
+
+=== Other Resources of TDA ===
+Applied Topology, by Stanford
+Applied algebraic topology research network Archived 2016-01-31 at the Wayback Machine, by the Institute for Mathematics and its Applications
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Twyman's_law-0.md b/data/en.wikipedia.org/wiki/Twyman's_law-0.md
index a3646fcf8..0cd521ffb 100644
--- a/data/en.wikipedia.org/wiki/Twyman's_law-0.md
+++ b/data/en.wikipedia.org/wiki/Twyman's_law-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Twyman's_law"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:29:26.280989+00:00"
+date_saved: "2026-05-05T09:55:19.200428+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Underdetermination-0.md b/data/en.wikipedia.org/wiki/Underdetermination-0.md
new file mode 100644
index 000000000..e1aece1a6
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Underdetermination-0.md
@@ -0,0 +1,27 @@
+---
+title: "Underdetermination"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Underdetermination"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:24.490868+00:00"
+instance: "kb-cron"
+---
+
+In the philosophy of science, underdetermination or the underdetermination of theory by data (sometimes abbreviated UTD) is the idea that evidence available to us at a given time may be insufficient to determine what beliefs we should hold in response to it. The underdetermination thesis states that all evidence necessarily underdetermines any scientific theory.
+Underdetermination exists when available evidence is insufficient to identify which belief one should hold about that evidence. For example, if all that was known was that exactly $10 were spent on apples and oranges, and that apples cost $1 and oranges $2, then one would know enough to eliminate some possibilities (e.g., 6 oranges could not have been purchased), but one would not have enough evidence to know which specific combination of apples and oranges were purchased. In this example, one would say that belief in what combination was purchased is underdetermined by the available evidence.
+In contrast, overdetermination in philosophy of science means that more evidence is available than is necessary to justify a conclusion.
+
+== Origin ==
+Ancient Greek skeptics argued for equipollence, the view that reasons for and against claims are equally balanced. This captures at least one sense of saying that the claims themselves are underdetermined.
+Underdetermination, again under different labels, arises in the modern period in the work of René Descartes. Among other skeptical arguments, Descartes presents two arguments involving underdetermination. His dream argument points out that experiences perceived while dreaming (for example, falling) do not necessarily contain sufficient information to deduce the true situation (being in bed). He concluded that since one cannot always distinguish dreams from reality, one cannot rule out the possibility that one is dreaming rather than having veridical experiences; thus the conclusion that one is having a veridical experience is underdetermined. His demon argument posits that all of one's experiences and thoughts might be manipulated by a very powerful and deceptive "evil demon". Once again, so long as the perceived reality appears internally consistent to the limits of one's limited ability to tell, the situation is indistinguishable from reality and one cannot logically determine that such a demon does not exist.
+
+== Underdetermination and evidence ==
+To show that a conclusion is underdetermined, one must show that there is a rival conclusion that is equally well supported by the standards of evidence.  A trivial example of underdetermination is the addition of the statement "whenever we look for evidence" (or more generally, any statement which cannot be falsified). For example, the conclusion "objects near Earth fall toward it when dropped" might be opposed by "objects near Earth fall toward it when dropped but only when one checks to see that they do." Since one may append this to any conclusion, all conclusions are at least trivially underdetermined.  If one considers such statements to be illegitimate, e.g. by applying Occam's Razor, then such "tricks" are not considered demonstrations of underdetermination.
+This concept also applies to scientific theories: for example, it is similarly trivial to find situations that a theory does not address. For example, classical mechanics did not distinguish between non-accelerating reference frames. As a result, any conclusion about such a reference frame was underdetermined; it was equally consistent with the theory to say that the Solar System is at rest, as it is to say that it moves at any constant velocity in any particular direction. Newton himself stated that these possibilities were indistinguishable. More generally, evidence may not always be sufficient to distinguish between competing theories (or to determine a different theory that will unify both), as is the case with general relativity and quantum mechanics.
+Another example is provided by Johann Wolfgang von Goethe's 1810 book Theory of Colours: "Newton believed that with the help of his prism experiments, he could prove that sunlight was composed of variously coloured rays of light. Goethe showed that this step from observation to theory is more problematic than Newton wanted to admit. By insisting that the step to theory is not forced upon us by the phenomena, Goethe revealed our own free, creative contribution to theory construction. And Goethe's insight is surprisingly significant, because he correctly claimed that all of the results of Newton's prism experiments fit a theoretical alternative equally well. If this is correct, then by suggesting an alternative to a well-established physical theory, Goethe developed the problem of underdetermination a century before Duhem and Quine's famous argument." (Mueller, 2016) Hermann von Helmholtz says of this, "And I for one do not know how anyone, regardless of what his views about colours are, can deny that the theory in itself is fully consequent, that its assumptions, once granted, explain the facts treated completely and indeed simply".
+Experimental violations of Bell inequality show, that there are some limitations to underdetermination – every theory exhibiting local realism and statistical independence was disproved by this tests. Analogus limitations follow from Kochen–Specker experiments. These tests employ only correlations between results of measurements and therefore are able to bypass the issue of theory-ladenness of observation.
+
+== Arguments involving underdetermination ==
+Arguments involving underdetermination attempt to show that there is no reason to believe some conclusion because it is underdetermined by the evidence.  Then, if the evidence available at a particular time can be equally well explained by at least one other hypothesis, there is no reason to believe it rather than the equally supported rival, which can be considered observationally equivalent (although many other hypotheses may still be eliminated).
+Because arguments involving underdetermination involve both a claim about what the evidence is and that such evidence underdetermines a conclusion, it is often useful to separate these two claims within the underdetermination argument as follows:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Underdetermination-1.md b/data/en.wikipedia.org/wiki/Underdetermination-1.md
new file mode 100644
index 000000000..8dbe8266b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Underdetermination-1.md
@@ -0,0 +1,52 @@
+---
+title: "Underdetermination"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Underdetermination"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:56:24.490868+00:00"
+instance: "kb-cron"
+---
+
+All the available evidence of a certain type underdetermines which of several rival conclusions is correct.
+Only evidence of that type is relevant to believing one of these conclusions.
+Therefore, there is no evidence for believing one among the rival conclusions.
+The first premise makes the claim that a theory is underdetermined. The second says that rational decision (i.e. using available evidence) depends upon insufficient evidence.
+
+=== Epistemological problem of the indeterminacy of data to theory ===
+Any phenomenon can be explained by a multiplicity of hypotheses.  How, then, can data ever be sufficient to prove a theory?  This is the "epistemological problem of the indeterminacy of data to theory".
+The poverty of the stimulus argument and W.V.O. Quine's 1960 'Gavagai' example are perhaps the most commented variants of the epistemological problem of the indeterminacy of data to theory.
+
+=== General skeptical arguments ===
+Some skeptical arguments appeal to the fact that no possible evidence could be incompatible with 'skeptical hypotheses' like the maintenance of a complex illusion by Descartes' evil demon or (in a modern version) the machines who run the Matrix. A skeptic may argue that this undermines any claims to knowledge, or even (by internalist definitions), justification.
+Philosophers have found this argument very powerful. Hume felt it was unanswerable, but observed that it was in practice impossible to accept its conclusions. Influenced by this, Kant held that while the nature of the 'noumenal' world was indeed unknowable, we could aspire to knowledge of the 'phenomenal' world. A similar response has been advocated by modern anti-realists.
+Underdetermined ideas are not implied to be incorrect (taking into account present evidence); rather, we cannot know if they are correct.
+
+=== Philosophy of science ===
+In the philosophy of science, underdetermination is often presented as a problem for scientific realism, which holds that we have reason to believe in entities that are not directly observable talked about by scientific theories.  One such argument proceeds as follows (to be compared to the previous one):  
+
+All the available observational evidence for such entities underdetermines the claims of a scientific theory about such entities.
+Only the observational evidence is relevant to believing a scientific theory.
+Therefore, there is no evidence for believing what scientific theories say about such entities.
+Particular responses to this argument attack both the first and the second premise (1 and 2).  It is argued against the first premise that the underdetermination must be strong and/or inductive. It is argued against the second premise that there is evidence for a theory's truth besides observations; for example, it is argued that simplicity, explanatory power or some other feature of a theory is evidence for it over its rivals.
+A more general response from the scientific realist is to argue that underdetermination is no special problem for science, because, as indicated earlier in this article, all knowledge that is directly or indirectly supported by evidence suffers from it—for example, conjectures concerning unobserved observables. It is therefore too powerful an argument to have any significance in the philosophy of science, since it does not cast doubt uniquely on conjectured unobservables.
+
+== See also ==
+Indeterminacy (philosophy)
+Poverty of the stimulus
+Reference class problem
+Scientific method
+Instrumentalism
+Confirmation holism
+Equifinality
+Metaphysics
+Occam's razor
+Observational equivalence
+Overdetermination
+Philosophical skepticism
+
+== Notes and references ==
+
+== External links ==
+Underdetermination and the Claims of Science by P. D. Magnus
+Stanford, Kyle (Winter 2000), Edward N. Zalta (ed.), Underdetermination of Scientific Theory, The Stanford Encyclopedia of Philosophy
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Unicity_(data_analysis)-0.md b/data/en.wikipedia.org/wiki/Unicity_(data_analysis)-0.md
new file mode 100644
index 000000000..ccb4dfd99
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Unicity_(data_analysis)-0.md
@@ -0,0 +1,182 @@
+---
+title: "Unicity (data analysis)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Unicity_(data_analysis)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:20.371242+00:00"
+instance: "kb-cron"
+---
+
+Unicity (
+  
+    
+      
+        
+          ε
+          
+            p
+          
+        
+      
+    
+    {\displaystyle \varepsilon _{p}}
+  
+) is a risk metric for measuring the re-identifiability of high-dimensional anonymous data. First introduced in 2013, unicity is measured by the number of points p needed to uniquely identify an individual in a data set. The fewer points needed, the more unique the traces are and the easier they would be to re-identify using outside information.
+In a high-dimensional, human behavioural data set, such as mobile phone meta-data, for each person, there exists potentially thousands of different records. In the case of mobile phone meta-data, credit card transaction histories and many other types of personal data, this information includes the time and location of an individual.
+In research, unicity is widely used to illustrate the re-identifiability of anonymous data sets. In 2013 researchers from the MIT Media Lab showed that only 4 points needed to uniquely identify 95% of individual trajectories in a de-identified data set of 1.5 million mobility trajectories. These points were location-time pairs that appeared with the resolution of 1 hour and 0.15 km² to 15 km². These results were shown to hold true for credit card transaction data as well with 4 points being enough to re-identify 90% of trajectories. Further research studied the unicity of the apps installed by people on their smartphones, the trajectories of vehicles, mobile phone data from Boston and Singapore, and, public transport data in Singapore obtained from smartcards.
+
+
+== Measuring unicity ==
+Unicity (
+  
+    
+      
+        
+          ε
+          
+            p
+          
+        
+      
+    
+    {\displaystyle \varepsilon _{p}}
+  
+) is formally defined as the expected value of the fraction of uniquely identifiable trajectories, given p points selected from those trajectories uniformly at random. A full computation of 
+  
+    
+      
+        
+          ε
+          
+            p
+          
+        
+      
+    
+    {\displaystyle \varepsilon _{p}}
+  
+ of a data set 
+  
+    
+      
+        D
+      
+    
+    {\displaystyle D}
+  
+ requires picking p points uniformly at random from each trajectory 
+  
+    
+      
+        
+          T
+          
+            i
+          
+        
+        ∈
+        D
+      
+    
+    {\displaystyle T_{i}\in D}
+  
+, and then checking whether or not any other trajectory also contains those p points. Averaging over all possible sets of p points for each trajectory results in a value for 
+  
+    
+      
+        
+          ε
+          
+            p
+          
+        
+      
+    
+    {\displaystyle \varepsilon _{p}}
+  
+. This is usually prohibitively expensive as it requires considering every possible set of p points for each trajectory in the data set — trajectories that sometimes contain thousands of points.
+Instead, unicity is usually estimated using sampling techniques. Specifically, given a data set 
+  
+    
+      
+        D
+      
+    
+    {\displaystyle D}
+  
+, the estimated unicity is computed by sampling from 
+  
+    
+      
+        D
+      
+    
+    {\displaystyle D}
+  
+ a fraction of the trajectories 
+  
+    
+      
+        S
+      
+    
+    {\displaystyle S}
+  
+ and then checking whether each of the trajectories 
+  
+    
+      
+        
+          T
+          
+            j
+          
+        
+        ∈
+        S
+      
+    
+    {\displaystyle T_{j}\in S}
+  
+ are unique in 
+  
+    
+      
+        D
+      
+    
+    {\displaystyle D}
+  
+ given p randomly selected points from each 
+  
+    
+      
+        
+          T
+          
+            j
+          
+        
+      
+    
+    {\displaystyle T_{j}}
+  
+. The fraction of 
+  
+    
+      
+        S
+      
+    
+    {\displaystyle S}
+  
+ that is uniquely identifiable is then the unicity estimate.
+
+
+== See also ==
+Quasi-identifier
+Personally Identifiable Information
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-0.md b/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-0.md
new file mode 100644
index 000000000..4563ac778
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-0.md
@@ -0,0 +1,287 @@
+---
+title: "Vietoris–Rips filtration"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Vietoris–Rips_filtration"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:21.553662+00:00"
+instance: "kb-cron"
+---
+
+In topological data analysis, the Vietoris–Rips filtration (sometimes shortened to "Rips filtration") is the collection of nested Vietoris–Rips complexes on a metric space created by taking the sequence of Vietoris–Rips complexes over an increasing scale parameter. Often, the Vietoris–Rips filtration is used to create a discrete, simplicial model on point cloud data embedded in an ambient metric space. The Vietoris–Rips filtration is a multiscale extension of the Vietoris–Rips complex that enables researchers to detect and track the persistence of topological features, over a range of parameters, by way of computing the persistent homology of the entire filtration. It is named after Leopold Vietoris and Eliyahu Rips.
+
+== Definition ==
+The Vietoris–Rips filtration is the nested collection of Vietoris–Rips complexes indexed by an increasing scale parameter. The Vietoris–Rips complex is a classical construction in mathematics that dates back to a 1927 paper of Leopold Vietoris, though it was independently considered by Eliyahu Rips in the study of hyperbolic groups, as noted by Mikhail Gromov in the 1980s. The conjoined name "Vietoris–Rips" is due to Jean-Claude Hausmann.
+
+Given a metric space 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ and a scale parameter (sometimes called the threshold or distance parameter) 
+  
+    
+      
+        r
+        ∈
+        [
+        0
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle r\in [0,\infty )}
+  
+, the Vietoris–Rips complex (with respect to 
+  
+    
+      
+        r
+      
+    
+    {\displaystyle r}
+  
+) is defined as 
+  
+    
+      
+        
+          
+            V
+            R
+          
+          
+            r
+          
+        
+        (
+        X
+        )
+        =
+        {
+        S
+        ⊆
+        X
+        ∣
+        S
+        
+           finite
+        
+        ;
+        diam
+        ⁡
+        S
+        ≤
+        r
+        ;
+        S
+        ≠
+        ∅
+        }
+      
+    
+    {\displaystyle \mathbf {VR} _{r}(X)=\{S\subseteq X\mid S{\text{ finite}};\operatorname {diam} S\leq r;S\neq \emptyset \}}
+  
+, where 
+  
+    
+      
+        diam
+        ⁡
+        S
+      
+    
+    {\displaystyle \operatorname {diam} S}
+  
+ is the diameter, i.e. the maximum distance of points lying in 
+  
+    
+      
+        S
+      
+    
+    {\displaystyle S}
+  
+.
+Observe that if 
+  
+    
+      
+        r
+        ≤
+        s
+        ∈
+        [
+        0
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle r\leq s\in [0,\infty )}
+  
+, there is a simplicial inclusion map 
+  
+    
+      
+        
+          
+            V
+            R
+          
+          
+            r
+          
+        
+        (
+        X
+        )
+        ↪
+        
+          
+            V
+            R
+          
+          
+            s
+          
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle \mathbf {VR} _{r}(X)\hookrightarrow \mathbf {VR} _{s}(X)}
+  
+ . The Vietoris–Rips filtration is the nested collection of complexes 
+  
+    
+      
+        
+          
+            V
+            R
+          
+          
+            r
+          
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle \mathbf {VR} _{r}(X)}
+  
+ :
+
+  
+    
+      
+        
+          V
+          R
+        
+        (
+        X
+        )
+        =
+        {
+        
+          
+            V
+            R
+          
+          
+            r
+          
+        
+        (
+        X
+        )
+        
+          }
+          
+            r
+            ∈
+            [
+            0
+            ,
+            ∞
+            )
+          
+        
+      
+    
+    {\displaystyle \mathbf {VR} (X)=\{\mathbf {VR} _{r}(X)\}_{r\in [0,\infty )}}
+  
+
+If the non-negative real numbers 
+  
+    
+      
+        [
+        0
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle [0,\infty )}
+  
+ are viewed as a posetal category via the 
+  
+    
+      
+        ≤
+      
+    
+    {\displaystyle \leq }
+  
+ relation, then the Vietoris–Rips filtration can be viewed as a functor 
+  
+    
+      
+        
+          V
+          R
+        
+        (
+        X
+        )
+        :
+        [
+        0
+        ,
+        ∞
+        )
+        →
+        
+          S
+          i
+          m
+          p
+        
+      
+    
+    {\displaystyle \mathbf {VR} (X):[0,\infty )\to \mathbf {Simp} }
+  
+ valued in the category of simplicial complexes and simplicial maps, where the morphisms (i.e., relations in the poset) in the source category induce inclusion maps among the complexes. Note that the category of simplicial complexes may be viewed as a subcategory of 
+  
+    
+      
+        
+          T
+          o
+          p
+        
+      
+    
+    {\displaystyle \mathbf {Top} }
+  
+, the category of topological spaces, by post-composing with the geometric realization functor.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-1.md b/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-1.md
new file mode 100644
index 000000000..d18a6bf01
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-1.md
@@ -0,0 +1,490 @@
+---
+title: "Vietoris–Rips filtration"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Vietoris–Rips_filtration"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:21.553662+00:00"
+instance: "kb-cron"
+---
+
+== Properties ==
+The size of a filtration refers to the number of simplices in the largest complex, assuming the underlying metric space is finite. The 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+-skeleton, i.e., the number of simplices up to dimension 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+, of the Vietoris–Rips filtration is known to be 
+  
+    
+      
+        O
+        
+          (
+          
+            n
+            
+              k
+              +
+              1
+            
+          
+          )
+        
+      
+    
+    {\displaystyle O\left(n^{k+1}\right)}
+  
+, where 
+  
+    
+      
+        n
+      
+    
+    {\displaystyle n}
+  
+ is the number of points. The size of the complete skeleton has precisely 
+  
+    
+      
+        
+          2
+          
+            n
+          
+        
+        −
+        1
+      
+    
+    {\displaystyle 2^{n}-1}
+  
+ simplices, one for each non-empty subset of points. Since this is exponential, researchers usually only compute the skeleton of the Vietoris–Rips filtration up to small values of 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+.
+When the underlying metric space is finite, the Vietoris–Rips filtration is sometimes referred to as essentially discrete, meaning that there exists some terminal or maximum scale parameter 
+  
+    
+      
+        
+          r
+          
+            max
+          
+        
+        ∈
+        [
+        0
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle r_{\text{max}}\in [0,\infty )}
+  
+ such that 
+  
+    
+      
+        
+          
+            V
+            R
+          
+          
+            s
+          
+        
+        (
+        X
+        )
+        =
+        
+          
+            V
+            R
+          
+          
+            
+              r
+              
+                max
+              
+            
+          
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle \mathbf {VR} _{s}(X)=\mathbf {VR} _{r_{\max }}(X)}
+  
+ for all 
+  
+    
+      
+        s
+        ≥
+        
+          r
+          
+            max
+          
+        
+      
+    
+    {\displaystyle s\geq r_{\max }}
+  
+, and furthermore that the inclusion map 
+  
+    
+      
+        
+          
+            V
+            R
+          
+          
+            s
+            →
+            t
+          
+        
+        (
+        X
+        )
+        :
+        
+          
+            V
+            R
+          
+          
+            s
+          
+        
+        (
+        X
+        )
+        ↪
+        
+          
+            V
+            R
+          
+          
+            t
+          
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle \mathbf {VR} _{s\to t}(X):\mathbf {VR} _{s}(X)\hookrightarrow \mathbf {VR} _{t}(X)}
+  
+ is an isomorphism for all but finitely many parameters 
+  
+    
+      
+        s
+        ≤
+        t
+      
+    
+    {\displaystyle s\leq t}
+  
+. In other words, when the underlying metric space is finite, the Vietoris–Rips filtration has a largest complex, and the complex changes at only a finite number of steps. The latter implies that the Vietoris–Rips filtration on a finite metric space can be considered as indexed over a discrete set such as 
+  
+    
+      
+        
+          N
+        
+      
+    
+    {\displaystyle \mathbb {N} }
+  
+, by restricting the filtration to the scale parameters at which the filtration changes, then relabeling the complexes using the natural numbers.
+An explicit bound can also be given for the number of steps at which the Vietoris–Rips filtration changes. The Vietoris–Rips complex is a clique complex, meaning it is entirely determined by its 1-skeleton. Therefore the number of steps at which the Vietoris–Rips filtration changes is bounded by the number of edges in the largest complex. The number of edges in the largest complex is 
+  
+    
+      
+        
+          
+            
+              (
+            
+            
+              n
+              2
+            
+            
+              )
+            
+          
+        
+        =
+        n
+        (
+        n
+        −
+        1
+        )
+        
+          /
+        
+        2
+      
+    
+    {\displaystyle {n \choose 2}=n(n-1)/2}
+  
+, since all 
+  
+    
+      
+        n
+      
+    
+    {\displaystyle n}
+  
+ vertices are joined by an edge. Therefore the Vietoris–Rips filtration changes at 
+  
+    
+      
+        O
+        (
+        
+          n
+          
+            2
+          
+        
+        )
+      
+    
+    {\displaystyle O(n^{2})}
+  
+ steps, where 
+  
+    
+      
+        O
+        (
+        −
+        )
+      
+    
+    {\displaystyle O(-)}
+  
+ denotes an asymptotic upper bound.
+For points in Euclidean space, the Vietoris–Rips filtration is an approximation to the Čech filtration, in the sense of the interleaving distance. This follows from the fact that for any scale parameter 
+  
+    
+      
+        α
+      
+    
+    {\displaystyle \alpha }
+  
+, the Vietoris–Rips and Čech complexes on a finite set 
+  
+    
+      
+        X
+      
+    
+    {\displaystyle X}
+  
+ of points in Euclidean space satisfy the inclusion relationship 
+  
+    
+      
+        
+          
+            V
+            R
+          
+          
+            α
+          
+        
+        (
+        X
+        )
+        ⊆
+        
+          
+            
+              
+                
+                  C
+                  ˇ
+                
+              
+            
+            e
+            c
+            h
+          
+          
+            
+              
+                2
+              
+            
+            α
+          
+        
+        ⁡
+        (
+        X
+        )
+        ⊆
+        
+          
+            V
+            R
+          
+          
+            
+              
+                2
+              
+            
+            α
+          
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle \mathbf {VR} _{\alpha }(X)\subseteq \operatorname {{\check {C}}ech} _{{\sqrt {2}}\alpha }(X)\subseteq \mathbf {VR} _{{\sqrt {2}}\alpha }(X)}
+  
+, which is sometimes referred to as the Vietoris–Rips Lemma. In general metric spaces, a straightforward application of the triangle inequality shows that  
+  
+    
+      
+        
+          
+            V
+            R
+          
+          
+            α
+          
+        
+        (
+        X
+        )
+        ⊆
+        
+          
+            
+              
+                
+                  C
+                  ˇ
+                
+              
+            
+            e
+            c
+            h
+          
+          
+            2
+            α
+          
+        
+        ⁡
+        (
+        X
+        )
+        ⊆
+        
+          
+            V
+            R
+          
+          
+            2
+            α
+          
+        
+        (
+        X
+        )
+      
+    
+    {\displaystyle \mathbf {VR} _{\alpha }(X)\subseteq \operatorname {{\check {C}}ech} _{2\alpha }(X)\subseteq \mathbf {VR} _{2\alpha }(X)}
+  
+ for any scale parameter 
+  
+    
+      
+        α
+      
+    
+    {\displaystyle \alpha }
+  
+.
+
+== Variants ==
+
+=== Approximations ===
+Since the Vietoris–Rips filtration has an exponential number of simplices in its complete skeleton, a significant amount of research has been done on approximating the persistent homology of the Vietoris–Rips filtration using constructions of smaller size. The first work in this direction was published by computer scientist Donald Sheehy in 2012, who showed how to construct a filtration of 
+  
+    
+      
+        O
+        (
+        n
+        )
+      
+    
+    {\displaystyle O(n)}
+  
+ size in 
+  
+    
+      
+        O
+        (
+        n
+        log
+        ⁡
+        n
+        )
+      
+    
+    {\displaystyle O(n\log n)}
+  
+ time that approximates the persistent homology of the Vietoris–Rips filtration to a desired margin of error. This type of filtration is known as a Sparse Vietoris–Rips filtration, since it removes points from the standard Vietoris–Rips filtration using ideas from computational geometry related to geometric spanners. Since then, there have been several more efficient methods developed for approximating the Vietoris–Rips filtration, mostly using the ideas of Sheehy, but also building upon approximation schemes developed for the Čech and Delaunay filtrations.
+
+=== Multiparameter Extensions ===
+It is known that persistent homology can be sensitive to outliers in the underlying data set. To remedy this, in 2009 Gunnar Carlsson and Afra Zomorodian proposed a multidimensional version of persistence, that considers filtrations with respect to multiple parameters, such as scale and density.
+To that end, several multiparameter extensions of the Vietoris–Rips filtration have been developed.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-2.md b/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-2.md
new file mode 100644
index 000000000..825554ca5
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Vietoris–Rips_filtration-2.md
@@ -0,0 +1,228 @@
+---
+title: "Vietoris–Rips filtration"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Vietoris–Rips_filtration"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:21.553662+00:00"
+instance: "kb-cron"
+---
+
+The Degree-Rips bifiltration extends the Vietoris–Rips filtration by constructing a sub-graph of the 1-skeleton of each complex in the Vietoris–Rips filtration, restricting only to vertices whose degree is at least a given parameter 
+  
+    
+      
+        a
+        ∈
+        [
+        0
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle a\in [0,\infty )}
+  
+, then building the clique complex on that subgraph. The degree of a vertex encodes density information about the data, because it is quantifies how "central" that point is by way of how many other vertices it is connected to. The collection over all degree parameters 
+  
+    
+      
+        a
+      
+    
+    {\displaystyle a}
+  
+ defines a filtration of each complex in the Vietoris–Rips filtration, where the complexes get smaller as 
+  
+    
+      
+        a
+      
+    
+    {\displaystyle a}
+  
+ increases. Altogether, this defines a 2-parameter extension of the Vietoris–Rips filtration, by considering the collection of bi-filtered complexes over all scale parameters 
+  
+    
+      
+        (
+        a
+        ,
+        r
+        )
+        ∈
+        
+          
+            R
+          
+          
+            op
+          
+        
+        ×
+        
+          R
+        
+      
+    
+    {\displaystyle (a,r)\in \mathbb {R} ^{\operatorname {op} }\times \mathbb {R} }
+  
+, where "op" denotes the opposite poset.
+The Function-Rips bifiltration extends the Vietoris–Rips filtration by bifiltering each complex according to the superlevel-sets of some function 
+  
+    
+      
+        γ
+        :
+        X
+        →
+        
+          R
+        
+      
+    
+    {\displaystyle \gamma :X\to \mathbb {R} }
+  
+, where 
+  
+    
+      
+        γ
+      
+    
+    {\displaystyle \gamma }
+  
+ can be a density function, an eccentricity function, or any other function. Namely, each complex is defined via 
+  
+    
+      
+        
+          F
+        
+        
+          -
+        
+        
+          
+            V
+            R
+          
+          
+            a
+            ,
+            r
+          
+        
+        (
+        γ
+        )
+        =
+        
+          
+            V
+            R
+          
+          
+            r
+          
+        
+        (
+        
+          γ
+          
+            −
+            1
+          
+        
+        [
+        a
+        ,
+        ∞
+        )
+        )
+      
+    
+    {\displaystyle \mathbf {F} {\text{-}}\mathbf {VR} _{a,r}(\gamma )=\mathbf {VR} _{r}(\gamma ^{-1}[a,\infty ))}
+  
+, which yields a bifiltration indexed over 
+  
+    
+      
+        
+          
+            R
+          
+          
+            op
+          
+        
+        ×
+        [
+        0
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle \mathbb {R} ^{\operatorname {op} }\times [0,\infty )}
+  
+.
+The subdivision-Rips bifiltration extends the Vietoris–Rips filtration by taking the barycentric subdivision of each complex in the Vietoris–Rips filtration, then filtering these complexes by dimension of each flag. Namely, the barycentric subdivision of a simplicial complex is the abstract simplicial complex defined using flags of simplices in the underlying complex, where a flag (sometimes called a chain) is a nested series of simplices 
+  
+    
+      
+        
+          σ
+          
+            0
+          
+        
+        ⊂
+        ⋯
+        ⊂
+        
+          σ
+          
+            m
+          
+        
+      
+    
+    {\displaystyle \sigma _{0}\subset \cdots \subset \sigma _{m}}
+  
+. Then given the barycentric subdivision of a complex, one can filter it by taking the subcomplex spanned by vertices corresponding to simplices in the original complex of some minimum dimension 
+  
+    
+      
+        k
+      
+    
+    {\displaystyle k}
+  
+. The collection of all such complexes yields a bifiltration indexed over 
+  
+    
+      
+        [
+        0
+        ,
+        ∞
+        
+          )
+          
+            op
+          
+        
+        ×
+        [
+        0
+        ,
+        ∞
+        )
+      
+    
+    {\displaystyle [0,\infty )^{\operatorname {op} }\times [0,\infty )}
+  
+.
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Visual_inspection-0.md b/data/en.wikipedia.org/wiki/Visual_inspection-0.md
new file mode 100644
index 000000000..335908dca
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Visual_inspection-0.md
@@ -0,0 +1,36 @@
+---
+title: "Visual inspection"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Visual_inspection"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T09:55:22.749258+00:00"
+instance: "kb-cron"
+---
+
+Visual inspection is a common method of quality control, data acquisition, and data analysis.
+Visual Inspection, used in maintenance of facilities, means inspection of equipment and structures using either or all of raw human senses such as vision, hearing, touch and smell and/or any non-specialized inspection equipment.
+Inspections requiring Ultrasonic, X-Ray equipment, Infrared, etc. are not typically regarded as visual inspection as these Inspection methodologies require specialized equipment, training and certification.
+
+
+== Quality control ==
+A study of the visual inspection of small integrated circuits found that the modal duration of eye fixations of trained inspectors was about 200 ms. The most accurate inspectors made the fewest eye fixations and were the fastest. When the same chip was judged more than once by an individual inspector the consistency of judgment was very high whereas the consistency between inspectors was somewhat less. Variation by a factor of six in inspection speed led to variation of less than a factor of two in inspection accuracy. Visual inspection had a false positive rate of 2% and a false negative rate of 23%.
+
+
+== Humorous terminology ==
+To do an eyeball search is to look for something specific in a mass of code or data with one's own eyes, as opposed to using some sort of pattern matching software like grep or any other automated search tool. Also known as vgrep or ogrep, i.e., "visual/optical grep". See also vdiff.
+"Eyeballing" is the most common and readily available method of initial data assessment. This method is effective for identifying patterns or anomalies in complex data but can be time-intensive and error-prone. Although low-cost and adaptable, its efficiency and ROI often fall short compared to automated tools, which offer greater scalability and consistency. However, switching from manual visual inspection to automated methods depends on the task's complexity, scale, and the balance between upfront costs and long-term efficiency.
+Experts in pattern recognition maintain that the "eyeball" technique is still the most effective procedure for searching arbitrary, possibly unknown structures in data.
+In the military, applying this sort of search to real-world terrain is often referred to as "using the Mark I Eyeball"  device (pronounced as Mark One Eyeball), the U.S. military adopting it in 1950s. The term is an allusion on military nomenclature, "Mark I" being the first version of a military vehicle or weapon.
+
+
+== See also ==
+Automated optical inspection
+Inspection
+Inspection (medicine)
+Statistical graphics
+Visual search
+Visual comparison
+
+
+== References ==
\ No newline at end of file