Scrape wikipedia-science: 296 new, 864 updated, 1195 total (kb-cron)

2026-05-04 20:56:24 -07:00 · 2026-05-04 20:56:24 -07:00 · 4280f692cd
commit 4280f692cd
parent 3858c74604
46 changed files with 1473 additions and 0 deletions
--- a/_index.db
+++ b/_index.db
--- a/data/en.wikipedia.org/wiki/Actuarial_science-0.md
+++ b/data/en.wikipedia.org/wiki/Actuarial_science-0.md
@ -0,0 +1,24 @@
+---
+title: "Actuarial science"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Actuarial_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:11.921022+00:00"
+instance: "kb-cron"
+---
+
+Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in insurance, pension, finance, investment, psychology, medicine, and other industries and professions.
+Actuaries are professionals trained in this discipline. In many countries, actuaries must demonstrate their competence by passing a series of rigorous professional examinations focused in fields such as probability and predictive analysis. According to the U.S. News & World Report, their job often has to do with using mathematics to identify risk so they can mitigate risk. 
+Actuarial science includes a number of interrelated subjects, including mathematics, probability theory, statistics, finance, economics, financial accounting and computer science. Historically, actuarial science used deterministic models in the construction of tables and premiums. The science has gone through revolutionary changes since the 1980s due to the proliferation of high speed computers and the union of stochastic actuarial models with modern financial theory.
+Many universities have undergraduate and graduate degree programs in actuarial science. In 2010, a study published by job search website CareerCast ranked actuary as the #1 job in the United States. The study used five key criteria to rank jobs: environment, income, employment outlook, physical demands, and stress. In 2024, U.S. News & World Report ranked actuary as the third-best job in the business sector and the eighth-best job in STEM.
+
+== Subfields ==
+
+=== Life insurance, pensions and healthcare ===
+Actuarial science became a formal mathematical discipline in the late 17th century with the increased demand for long-term insurance coverage such as burial, life insurance, and annuities. These long term coverages required that money be set aside to pay future benefits, such as annuity and death benefits many years into the future. This requires estimating future contingent events, such as the rates of mortality by age, as well as the development of mathematical techniques for discounting the value of funds set aside and invested. This led to the development of an important actuarial concept, referred to as the present value of a future sum. Certain aspects of the actuarial methods for discounting pension funds have come under criticism from modern financial economics.
+
+In traditional life insurance, actuarial science focuses on the analysis of mortality, the production of life tables, and the application of compound interest to produce life insurance, annuities and endowment policies. Contemporary life insurance programs have been extended to include credit and mortgage insurance, key person insurance for small businesses, long term care insurance and health savings accounts.
+In health insurance, including insurance provided directly by employers, and social insurance, actuarial science focuses on the analysis of rates of disability, morbidity, mortality, fertility and other contingencies. The effects of consumer choice and the geographical distribution of the utilization of medical services and procedures, and the utilization of drugs and therapies, is also of great importance. These factors underlay the development of the Resource-Base Relative Value Scale (RBRVS) at Harvard in a multi-disciplined study. Actuarial science also aids in the design of benefit structures, reimbursement standards, and the effects of proposed government standards on the cost of healthcare.
+In the pension industry, actuarial methods are used to measure the costs of alternative strategies with regard to the design, funding, accounting, administration, and maintenance or redesign of pension plans. The strategies are greatly influenced by short-term and long-term bond rates, the funded status of the pension and benefit arrangements, collective bargaining; the employer's old, new and foreign competitors; the changing demographics of the workforce; changes in the internal revenue code; changes in the attitude of the internal revenue service regarding the calculation of surpluses; and equally importantly, both the short and long term financial and economic trends. It is common with mergers and acquisitions that several pension plans have to be combined or at least administered on an equitable basis. When benefit changes occur, old and new benefit plans have to be blended, satisfying new social demands and various government discrimination test calculations, and providing employees and retirees with understandable choices and transition paths. Benefit plans liabilities have to be properly valued, reflecting both earned benefits for past service, and the benefits for future service. Finally, funding schemes have to be developed that are manageable and satisfy the standards board or regulators of the appropriate country, such as the Financial Accounting Standards Board in the United States.
+In social welfare programs, the Office of the Chief Actuary (OCACT), Social Security Administration plans and directs a program of actuarial estimates and analyses relating to SSA-administered retirement, survivors and disability insurance programs and to proposed changes in those programs. It evaluates operations of the Federal Old-Age and Survivors Insurance Trust Fund and the Federal Disability Insurance Trust Fund, conducts studies of program financing, performs actuarial and demographic research on social insurance and related program issues involving mortality, morbidity, utilization, retirement, disability, survivorship, marriage, unemployment, poverty, old age, families with children, etc., and projects future workloads. In addition, the Office is charged with conducting cost analyses relating to the Supplemental Security Income (SSI) program, a general-revenue financed, means-tested program for low-income aged, blind and disabled people. The office provides technical and consultative services to the Commissioner, to the board of trustees of the Social Security Trust Funds, and its staff appears before Congressional Committees to provide expert testimony on the actuarial aspects of Social Security issues.
--- a/data/en.wikipedia.org/wiki/Actuarial_science-1.md
+++ b/data/en.wikipedia.org/wiki/Actuarial_science-1.md
@ -0,0 +1,35 @@
+---
+title: "Actuarial science"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Actuarial_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:11.921022+00:00"
+instance: "kb-cron"
+---
+
+=== Applications to other forms of insurance ===
+Actuarial science is also applied to property, casualty, liability, and general insurance. In these forms of insurance, coverage is generally provided on a renewable period (such as a yearly). Coverage can be cancelled at the end of the period by either party.
+Property and casualty insurance companies tend to specialize because of the complexity and diversity of risks. One division is to organize around personal and commercial lines of insurance. Personal lines of insurance are for individuals and include fire, auto, homeowners, theft, and umbrella coverages. Commercial lines address the insurance needs of businesses and include property, business continuation, product liability, fleet/commercial vehicle, workers compensation, fidelity and surety, and D&O insurance. The insurance industry also provides coverage for exposures such as catastrophe, weather-related risks, earthquakes, patent infringement and other forms of corporate espionage, terrorism, and "one-of-a-kind" (e.g., satellite launch). Actuarial science provides data collection, measurement, estimating, forecasting, and valuation tools to provide financial and underwriting data for management to assess marketing opportunities and the nature of the risks. Actuarial science often helps to assess the overall risk from catastrophic events in relation to its underwriting capacity or surplus.
+In the reinsurance fields, actuarial science can be used to design and price reinsurance and retrocession arrangements, and to establish reserve funds for known claims and future claims and catastrophes.
+
+=== Actuaries in criminal justice ===
+There is an increasing trend to recognize that actuarial skills can be applied to a range of applications outside the traditional fields of insurance, pensions, etc. One notable example is the use in some US states of actuarial models to set criminal sentencing guidelines. These models attempt to predict the chance of re-offending according to rating factors which include the type of crime, age, educational background and ethnicity of the offender. However, these models have been open to criticism as providing justification for discrimination against specific ethnic groups by law enforcement personnel.  Whether this is statistically correct or a self-fulfilling correlation remains under debate.
+Another example is the use of actuarial models to assess the risk of sex offense recidivism. Actuarial models and associated tables, such as the MnSOST-R, Static-99, and SORAG, have been used since the late 1990s to determine the likelihood that a sex offender will re-offend and thus whether he or she should be institutionalized or set free.
+
+=== Actuarial science related to modern financial economics ===
+Traditional actuarial science and modern financial economics in the US have different practices, which is caused by different ways of calculating funding and investment strategies, and by different regulations.
+Regulations are from the Armstrong investigation of 1905, the Glass–Steagall Act of 1932, the adoption of the Mandatory Security Valuation Reserve by the National Association of Insurance Commissioners, which cushioned market fluctuations, and the Financial Accounting Standards Board, (FASB) in the US and Canada, which regulates pensions valuations and funding.
+
+== History ==
+
+Historically, much of the foundation of actuarial theory predated modern financial theory. In the early twentieth century, actuaries were developing many techniques that can be found in modern financial theory, but for various historical reasons, these developments did not achieve much recognition.
+As a result, actuarial science developed along a different path, becoming more reliant on assumptions, as opposed to the arbitrage-free risk-neutral valuation concepts used in modern finance. The divergence is not related to the use of historical data and statistical projections of liability cash flows, but is instead caused by the manner in which traditional actuarial methods apply market data with those numbers. For example, one traditional actuarial method suggests that changing the mix of investments can change the value of liabilities and assets (by changing the discount rate assumption). This concept is inconsistent with financial economics.
+The potential of modern financial economics theory to complement existing actuarial science was recognized by actuaries in the mid-twentieth century. In the late 1980s and early 1990s, there was a distinct effort for actuaries to combine financial theory and stochastic methods into their established models. Ideas from financial economics became increasingly influential in actuarial thinking, and actuarial science has started to embrace more sophisticated mathematical modelling of finance. Today, the profession, both in practice and in the educational syllabi of many actuarial organizations, is cognizant of the need to reflect the combined approach of tables, loss models, stochastic methods, and financial theory. However, assumption-dependent concepts are still widely used (such as the setting of the discount rate assumption as mentioned earlier), particularly in North America.
+Product design adds another dimension to the debate. Financial economists argue that pension benefits are bond-like and should not be funded with equity investments without reflecting the risks of not achieving expected returns. But some pension products do reflect the risks of unexpected returns. In some cases, the pension beneficiary assumes the risk, or the employer assumes the risk. The current debate now seems to be focusing on four principles:
+
+financial models should be free of arbitrage.
+assets and liabilities with identical cash flows should have the same price. This is at odds with FASB.
+the value of an asset is independent of its financing.
+how pension assets should be invested
+Essentially, financial economics state that pension assets should not be invested in equities for a variety of theoretical and practical reasons.
--- a/data/en.wikipedia.org/wiki/Actuarial_science-2.md
+++ b/data/en.wikipedia.org/wiki/Actuarial_science-2.md
@ -0,0 +1,32 @@
+---
+title: "Actuarial science"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Actuarial_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:11.921022+00:00"
+instance: "kb-cron"
+---
+
+=== Pre-formalisation ===
+Elementary mutual aid agreements and pensions arose in antiquity. Early in the Roman Empire, associations were formed to meet the expenses of burial, cremation, and monuments—precursors to burial insurance and friendly societies. A small sum was paid into a communal fund on a weekly basis, and upon the death of a member, the fund would cover the expenses of rites and burial. These societies sometimes sold shares in the building of columbāria, or burial vaults, owned by the fund—the precursor to mutual insurance companies. Other early examples of mutual surety and assurance pacts can be traced back to various forms of fellowship within the Saxon clans of England and their Germanic forebears, and to Celtic society. However, many of these earlier forms of surety and aid would often fail due to lack of understanding and knowledge.
+
+=== Initial development ===
+The 17th century was a period of advances in mathematics in Germany, France and England. At the same time there was a rapidly growing desire and need to place the valuation of personal risk on a more scientific basis. Independently of each other, compound interest was studied and probability theory emerged as a well-understood mathematical discipline. Another important advance came in 1662 from a London draper, the father of demography, John Graunt, who showed that there were predictable patterns of longevity and death in a group, or cohort, of people of the same age, despite the uncertainty of the date of death of any one individual. This study became the basis for the original life table. One could now set up an insurance scheme to provide life insurance or pensions for a group of people, and to calculate with some degree of accuracy how much each person in the group should contribute to a common fund assumed to earn a fixed rate of interest. The first person to demonstrate publicly how this could be done was Edmond Halley (of Halley's Comet fame). Halley constructed his own life table, and showed how it could be used to calculate the premium amount someone of a given age should pay to purchase a life annuity.
+
+=== Early actuaries ===
+James Dodson's pioneering work on the long term insurance contracts under which the same premium is charged each year led to the formation of the Society for Equitable Assurances on Lives and Survivorship (now commonly known as Equitable Life) in London in 1762. William Morgan is often considered the father of modern actuarial science for his work in the field in the 1780s and 90s. Many other life insurance companies and pension funds were created over the following 200 years. Equitable Life was the first to use the word "actuary" for its chief executive officer in 1762. Previously, "actuary" meant an official who recorded the decisions, or "acts", of ecclesiastical courts. Other companies that did not use such mathematical and scientific methods most often failed or were forced to adopt the methods pioneered by Equitable.
+
+=== Technological advances ===
+In the 18th and 19th centuries, calculations were performed without computers. The computations of life insurance premiums and reserving requirements are rather complex, and actuaries developed techniques to make the calculations as easy as possible, for example "commutation functions" (essentially precalculated columns of summations over time of discounted values of survival and death probabilities). Actuarial organizations were founded to support and further both actuaries and actuarial science, and to protect the public interest by promoting competency and ethical standards. However, calculations remained cumbersome, and actuarial shortcuts were commonplace. Non-life actuaries followed in the footsteps of their life insurance colleagues during the 20th century. The 1920 revision for the New-York based National Council on Workmen's Compensation Insurance rates took over two months of around-the-clock work by day and night teams of actuaries. In the 1930s and 1940s, the mathematical foundations for stochastic processes were developed. Actuaries could now begin to estimate losses using models of random events, instead of the deterministic methods they had used in the past. The introduction and development of the computer further revolutionized the actuarial profession. From pencil-and-paper to punchcards to current high-speed devices, the modeling and forecasting ability of the actuary has rapidly improved, while still being heavily dependent on the assumptions input into the models, and actuaries needed to adjust to this new world .
+
+== See also ==
+
+== References ==
+
+=== Works cited ===
+
+=== Bibliography ===
+Charles L. Trowbridge (1989). "Fundamental Concepts of Actuarial Science" (PDF). Revised Edition. Actuarial Education and Research Fund. Archived from the original (PDF) on 2006-06-29. Retrieved 2006-06-28.
+
+== External links ==
--- a/data/en.wikipedia.org/wiki/Analytics-0.md
+++ b/data/en.wikipedia.org/wiki/Analytics-0.md
@ -0,0 +1,25 @@
+---
+title: "Analytics"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Analytics"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:13.255306+00:00"
+instance: "kb-cron"
+---
+
+Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data, which also falls under and directly relates to the umbrella term, data science. Analytics also entails applying data patterns toward effective decision-making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming, and operations research to quantify performance.
+Organizations may apply analytics to business data to describe, predict, and improve business performance. Specifically, areas within analytics include descriptive analytics, diagnostic analytics, predictive analytics, prescriptive analytics, and cognitive analytics. Analytics may apply to a variety of fields such as marketing, management, finance, online systems, information security, and software services. Since analytics can require extensive computation (see big data), the algorithms and software used for analytics harness the most current methods in computer science, statistics, and mathematics. According to International Data Corporation, global spending on big data and business analytics (BDA) solutions is estimated to reach $215.7 billion in 2021. As per Gartner, the overall analytic platforms software market grew by $25.5 billion in 2020.  
+
+== Analytics vs analysis ==
+Data analysis focuses on the process of examining past data through business understanding, data understanding, data preparation, modeling and evaluation, and deployment. It is a subset of data analytics, which takes multiple data analysis processes to focus on why an event happened and what may happen in the future based on the previous data. Data analytics is used to formulate larger organizational decisions. 
+Data analytics is a multidisciplinary field. There is extensive use of computer skills, mathematics, statistics, descriptive techniques, and predictive models to gain valuable knowledge from data through analytics. There is increasing use of the term advanced analytics, typically used to describe the technical aspects of analytics, especially in the emerging fields such as the use of machine learning techniques like neural networks, decision trees, logistic regression, linear to multiple regression analysis, and classification to do predictive modeling. It also includes unsupervised machine learning techniques like cluster analysis, principal component analysis, segmentation profile analysis, and association analysis.
+
+== Applications ==
+
+=== Marketing optimization ===
+Marketing organizations use analytics to evaluate the outcomes of campaigns and initiatives, as well as to guide decisions related to investment and consumer targeting. Techniques such as demographic studies, customer segmentation, conjoint analysis, and others enable marketers to analyze large volumes of consumer purchase data, survey data, and panel data. This helps them better understand consumer behavior and effectively communicate marketing strategies.
+Marketing analytics consists of both qualitative and quantitative, structured and unstructured data used to drive strategic decisions about brand and revenue outcomes. The process involves predictive modelling, marketing experimentation, automation, and real-time sales communications. The data enables companies to make predictions and alter strategic execution to maximize performance results.
+Web analytics allows marketers to collect session-level information about interactions on a website using an operation called sessionization. Google Analytics is an example of a popular free analytics tool that marketers use for this purpose. Those interactions provide web analytics information systems with the information necessary to track the referrer, search keywords, identify the IP address, and track the activities of the visitor. With this information, a marketer can improve marketing campaigns, website creative content, and information architecture.
+Analysis techniques frequently used in marketing include marketing mix modeling, pricing and promotion analyses, sales force optimization, and customer analytics, e.g., segmentation. Web analytics and optimization of websites and online campaigns now frequently work hand in hand with the more traditional marketing analysis techniques. A focus on digital media has slightly changed the vocabulary so that marketing mix modeling is commonly referred to as attribution modeling in the digital or marketing mix modeling context.
+These tools and techniques support both strategic marketing decisions (such as how much overall to spend on marketing, how to allocate budgets across a portfolio of brands and the marketing mix) and more tactical campaign support, in terms of targeting the best potential customer with the optimal message in the most cost-effective medium at the ideal time.
--- a/data/en.wikipedia.org/wiki/Analytics-1.md
+++ b/data/en.wikipedia.org/wiki/Analytics-1.md
@ -0,0 +1,32 @@
+---
+title: "Analytics"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Analytics"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:13.255306+00:00"
+instance: "kb-cron"
+---
+
+=== People analytics ===
+People analytics uses behavioral data to understand how people work and change how companies are managed. It can be referred to by various names, depending on the context, the purpose of the analytics, or the specific focus of the analysis. Some examples include workforce analytics, HR analytics, talent analytics, people insights, talent insights, colleague insights, human capital analytics, and human resources information system (HRIS) analytics. HR analytics is the application of analytics to help companies manage human resources. 
+HR analytics has become a strategic tool in analyzing and forecasting human-related trends in the changing labor markets, using career analytics tools. The aim is to discern which employees to hire, which to reward or promote, what responsibilities to assign, and similar human resource problems. For example, inspection of the strategic phenomenon of employee turnover utilizing people analytics tools may serve as an important analysis at times of disruption. 
+It has been suggested that people analytics is a separate discipline to HR analytics, with a greater focus on addressing business issues, while HR Analytics is more concerned with metrics related to HR processes. Additionally, people analytics may now extend beyond the human resources function in organizations. However, experts find that many HR departments are burdened by operational tasks and need to prioritize people analytics and automation to become a more strategic and capable business function in the evolving world of work, rather than producing basic reports that offer limited long-term value. Some experts argue that a change in the way HR departments operate is essential. Although HR functions were traditionally centered on administrative tasks, they are now evolving with a new generation of data-driven HR professionals who serve as strategic business partners.
+Examples of HR analytic metrics include employee lifetime value (ELTV), labour cost expense percent, union percentage, etc. Quality of promotion is a metric used to assess a person's promotion decisions.  The metric determines whether employees appropriately advanced on objective criteria rather or if a bias is present.
+
+=== Portfolio analytics ===
+A common application of business analytics is portfolio analysis. In this, a bank or lending agency has a collection of accounts of varying value and risk. The accounts may differ by the social status (wealthy, middle-class, poor, etc.) of the holder, the geographical location, its net value, and many other factors. The lender must balance the return on the loan with the risk of default for each loan. The question is then how to evaluate the portfolio as a whole.
+The least risk loan may be to the very wealthy, but there are a very limited number of wealthy people. On the other hand, there are many poor that can be lent to, but at greater risk. Some balance must be struck that maximizes return and minimizes risk. The analytics solution may combine time series analysis with many other issues in order to make decisions on when to lend money to these different borrower segments, or decisions on the interest rate charged to members of a portfolio segment to cover any losses among members in that segment.
+
+=== Risk analytics ===
+Predictive models in the banking industry are developed to bring certainty across the risk scores for individual customers. Credit scores are built to predict an individual's delinquency behavior and are widely used to evaluate the credit worthiness of each applicant. Furthermore, risk analyses are carried out in the scientific world and the insurance industry. It is also extensively used in financial institutions like online payment gateway companies to analyse if a transaction was genuine or fraud. For this purpose, they use the transaction history of the customer. This is more commonly used in Credit Card purchases, when there is a sudden spike in the customer transaction volume the customer gets a call of confirmation if the transaction was initiated by him/her. This helps in reducing loss due to such circumstances.
+
+=== Digital analytics ===
+Digital analytics is a set of business and technical activities that define, create, collect, verify, or transform digital data into reporting, research, analyses, recommendations, optimizations, predictions, and automation. This also includes SEO (search engine optimization), where the keyword search is tracked and that data is used for marketing purposes, and banner ad clicks. A growing number of brands and marketing firms rely on digital analytics for their digital marketing assignments, where marketing return on investment (MROI) is an important key performance indicator (KPI).
+
+=== Security analytics ===
+Security analytics refers to information technology (IT) to gather security events to understand and analyze events that pose the greatest security risks. Products in this area include security information and event management and user behavior analytics.
+
+=== Software analytics ===
+
+Software analytics is the process of collecting information about the way a piece of software is used and produced.
--- a/data/en.wikipedia.org/wiki/Analytics-2.md
+++ b/data/en.wikipedia.org/wiki/Analytics-2.md
@ -0,0 +1,25 @@
+---
+title: "Analytics"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Analytics"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:13.255306+00:00"
+instance: "kb-cron"
+---
+
+== Challenges ==
+In the industry of commercial analytics software, an emphasis has emerged on solving the challenges of analyzing massive, complex data sets, often when such data is in a constant state of change. Such data sets are commonly referred to as big data. Whereas once the problems posed by big data were only found in the scientific community, today big data is a problem for many businesses that operate transactional systems online and, as a result, amass large volumes of data quickly.
+The analysis of unstructured data types is another challenge getting attention in the industry. Unstructured data differs from structured data in that its format varies widely and cannot be stored in traditional relational databases without significant effort at data transformation. Sources of unstructured data, such as email, the contents of word processor documents, PDFs, geospatial data, etc., are rapidly becoming a relevant source of business intelligence for businesses, governments, and universities.
+These challenges are the current inspiration for much of the innovation in modern analytics information systems, giving birth to relatively new machine analysis concepts such as complex event processing, full text search and analysis, and even new ideas in presentation. One such innovation is the introduction of grid-like architecture in machine analysis, allowing increases in the speed of massively parallel processing by distributing the workload to many computers all with equal access to the complete data set.
+Analytics is increasingly used in education, particularly at the district and government office levels. However, the complexity of student performance measures presents challenges when educators try to understand and use analytics to discern patterns in student performance, predict graduation likelihood, improve chances of student success, etc. For example, in a study involving districts known for strong data use, 48% of teachers had difficulty posing questions prompted by data, 36% did not comprehend given data, and 52% incorrectly interpreted data. To combat this, some analytics tools for educators adhere to an over-the-counter data format (embedding labels, supplemental documentation, and a help system, and making key package/display and content decisions) to improve educators' understanding and use of the analytics being displayed.
+
+=== Risks ===
+Risks for the general population include discrimination on the basis of characteristics such as gender, skin colour, ethnic origin, or political opinions through mechanisms such as price discrimination or statistical discrimination.
+
+== See also ==
+
+== References ==
+
+== External links ==
+ The dictionary definition of analytics at Wiktionary
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-0.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-0.md
@ -0,0 +1,27 @@
+---
+title: "Artificial intelligence"
+chunk: 1/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in engineering, mathematics and computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.
+High-profile applications of AI include advanced web search engines, chatbots, virtual assistants, autonomous vehicles, and play and analysis in strategy games (e.g., chess and Go). Since the 2020s, generative AI has become widely available to generate images, audio, and videos from text prompts.
+The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, and perception, as well as support for robotics. To reach these goals, AI researchers have used techniques including state space search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics. AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields. Some companies, such as OpenAI, Google DeepMind and Meta, aim to create artificial general intelligence (AGI) – AI that can complete virtually any cognitive task at least as well as a human.
+Artificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism throughout its history, followed by periods of disappointment and loss of funding, known as AI winters. Funding and interest increased substantially after 2012, when graphics processing units began being used to accelerate neural networks, and deep learning outperformed previous AI techniques. This growth accelerated further after 2017 with the transformer architecture. In the 2020s, an AI boom has coincided with advances in generative AI, which allowed for the creation and modification of media. In addition to AI safety and unintended consequences and harms from the use of AI, ethical concerns, AI's long-term effects, and potential existential risks have prompted discussions of AI regulation.
+
+== Goals ==
+The general problem of simulating (or creating) intelligence has been broken into subproblems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research.
+
+=== Reasoning and problem-solving ===
+Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions. By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics.
+Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": They become exponentially slower as the problems grow. Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments. Accurate and efficient reasoning is an unsolved problem.
+
+=== Knowledge representation ===
+
+Knowledge representation and knowledge engineering allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval, scene interpretation, clinical decision support, knowledge discovery (mining "interesting" and actionable inferences from large databases), and other areas.
+A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge. Knowledge bases need to represent things such as objects, properties, categories, and relations between objects; situations, events, states, and time; causes and effects; knowledge about knowledge (what we know about what other people know); default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing); and many other aspects and domains of knowledge.
+Among the most difficult problems in knowledge representation are the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous); and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally). There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications.
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-1.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-1.md
@ -0,0 +1,47 @@
+---
+title: "Artificial intelligence"
+chunk: 2/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+=== Planning and decision-making ===
+An "agent" is any entity (artificial or not) that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen. In automated planning, the agent has a specific goal. In automated decision-making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision-making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility.
+In classical planning, the agent knows exactly what the effect of any action will be. In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked. 
+Alongside thorough testing and improvement based on previous decisions, having an explanation for why the agent took certain decisions is a way to build trust, especially when the decisions have to be relied upon.
+In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning), or the agent can seek information to improve its preferences. Information value theory can be used to weigh the value of exploratory or experimental actions. The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain of what the outcome will be.
+A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned.
+Game theory describes the rational behavior of multiple interacting agents and is used in AI programs that make decisions that involve other agents.
+
+=== Learning ===
+Machine learning is the study of programs that can improve their performance on a given task automatically. It has been a part of AI from the beginning.
+
+There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance. Supervised learning requires labeling the training data with the expected answers, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input).
+In reinforcement learning, the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good". Transfer learning is when the knowledge gained from one problem is applied to a new problem. Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning.
+Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization.
+
+=== Natural language processing ===
+Natural language processing (NLP) allows programs to read, write and communicate in human languages. Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering.
+Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation unless restricted to small domains called "micro-worlds" (due to the common sense knowledge problem). Margaret Masterman believed that it was meaning and not grammar that was the key to understanding languages, and that thesauri and not dictionaries should be the basis of computational language structure.
+Modern deep learning techniques for NLP include word embedding (representing words, typically as vectors encoding their meaning), transformers (a deep learning architecture using an attention mechanism), and others. In 2019, generative pre-trained transformer (or "GPT") language models began to generate coherent text, and by 2023, these models were able to get human-level scores on the bar exam, SAT test, GRE test, and many other real-world applications.
+
+=== Perception ===
+Machine perception is the ability to use input from sensors (such as cameras, microphones, wireless signals, active lidar, sonar, radar, and tactile sensors) to deduce aspects of the world. Computer vision is the ability to analyze visual input.
+The field includes speech recognition, image classification, facial recognition, object recognition, object tracking, and robotic perception.
+
+=== Social intelligence ===
+
+Affective computing is a field that comprises systems that recognize, interpret, process, or simulate human feeling, emotion, and mood. For example, some virtual assistants are programmed to speak conversationally or even to banter humorously; it makes them appear more sensitive to the emotional dynamics of human interaction, or to otherwise facilitate human–computer interaction.
+However, this tends to give naïve users an unrealistic conception of the intelligence of existing computer agents. Moderate successes related to affective computing include textual sentiment analysis and, more recently, multimodal sentiment analysis, wherein AI classifies the effects displayed by a videotaped subject.
+
+=== General intelligence ===
+A machine with artificial general intelligence would be able to solve a wide variety of problems with breadth and versatility similar to human intelligence.
+
+== Techniques ==
+AI research uses a wide variety of techniques to accomplish the goals above.
+
+=== Search and optimization ===
+There are two different kinds of search used in AI: state space search and local search:
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-10.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-10.md
@ -0,0 +1,29 @@
+---
+title: "Artificial intelligence"
+chunk: 11/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Recent public debates in artificial intelligence have increasingly focused on its broader societal and ethical implications. It has been argued AI will become so powerful that humanity may irreversibly lose control of it. This could, as physicist Stephen Hawking stated, "spell the end of the human race". This scenario has been common in science fiction, when a computer or robot suddenly develops a human-like "self-awareness" (or "sentience" or "consciousness") and becomes a malevolent character. These sci-fi scenarios are misleading in several ways.
+First, AI does not require human-like sentience to be an existential risk. Modern AI programs are given specific goals and use learning and intelligence to achieve them. Philosopher Nick Bostrom argued that if one gives almost any goal to a sufficiently powerful AI, it may choose to destroy humanity to achieve it (he used the example of an automated paperclip factory that destroys the world to get more iron for paperclips). Stuart Russell gives the example of household robot that tries to find a way to kill its owner to prevent it from being unplugged, reasoning that "you can't fetch the coffee if you're dead." In order to be safe for humanity, a superintelligence would have to be genuinely aligned with humanity's morality and values so that it is "fundamentally on our side".
+Second, Yuval Noah Harari argues that AI does not require a robot body or physical control to pose an existential risk. The essential parts of civilization are not physical. Things like ideologies, law, government, money and the economy are built on language; they exist because there are stories that billions of people believe. The current prevalence of misinformation suggests that an AI could use language to convince people to believe anything, even to take actions that are destructive. Geoffrey Hinton said in 2025 that modern AI is particularly "good at persuasion" and getting better all the time. He asks "Suppose you wanted to invade the capital of the US. Do you have to go there and do it yourself? No. You just have to be good at persuasion."
+The opinions amongst experts and industry insiders are mixed, with sizable fractions both concerned and unconcerned by risk from eventual superintelligent AI. Personalities such as Stephen Hawking, Bill Gates, and Elon Musk, as well as AI pioneers such as Geoffrey Hinton, Yoshua Bengio, Stuart Russell, Demis Hassabis, and Sam Altman, have expressed concerns about existential risk from AI.
+In May 2023, Geoffrey Hinton announced his resignation from Google in order to be able to "freely speak out about the risks of AI" without "considering how this impacts Google". He notably mentioned risks of an AI takeover, and stressed that in order to avoid the worst outcomes, establishing safety guidelines will require cooperation among those competing in use of AI.
+In 2023, many leading AI experts endorsed the joint statement that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war".
+Some other researchers were more optimistic. AI pioneer Jürgen Schmidhuber did not sign the joint statement, emphasising that in 95% of all cases, AI research is about making "human lives longer and healthier and easier." While the tools that are now being used to improve lives can also be used by bad actors, "they can also be used against the bad actors." Andrew Ng also argued that "it's a mistake to fall for the doomsday hype on AI—and that regulators who do will only benefit vested interests." Yann LeCun, a Turing Award winner, disagreed with the idea that AI will subordinate humans "simply because they are smarter, let alone destroy [us]", "scoff[ing] at his peers' dystopian scenarios of supercharged misinformation and even, eventually, human extinction." In contrast, he claimed that "intelligent machines will usher in a new renaissance for humanity, a new era of enlightenment." In the early 2010s, experts argued that the risks are too distant in the future to warrant research or that humans will be valuable from the perspective of a superintelligent machine. However, after 2016, the study of current and future risks and possible solutions became a serious area of research.
+
+=== Ethical machines and alignment ===
+
+Friendly AI are machines that have been designed from the beginning to minimize risks and to make choices that benefit humans. Eliezer Yudkowsky, who coined the term, argues that developing friendly AI should be a higher research priority: it may require a large investment and it must be completed before AI becomes an existential risk.
+Machines with intelligence have the potential to use their intelligence to make ethical decisions. The field of machine ethics provides machines with ethical principles and procedures for resolving ethical dilemmas.
+The field of machine ethics is also called computational morality,
+and was founded at an AAAI symposium in 2005.
+Other approaches include Wendell Wallach's "artificial moral agents" and Stuart J. Russell's three principles for developing provably beneficial machines.
+
+=== Open source ===
+
+Active organizations in the AI open-source community include Hugging Face, Google, EleutherAI and Meta. Various AI models, such as Llama 2, Mistral or Stable Diffusion, have been made open-weight, meaning that their architecture and trained parameters (the "weights") are publicly available. Open-weight models can be freely fine-tuned, which allows companies to specialize them with their own data and for their own use-case. Open-weight models are useful for research and innovation but can also be misused. Since they can be fine-tuned, any built-in security measure, such as objecting to harmful requests, can be trained away until it becomes ineffective. Some researchers warn that future AI models may develop dangerous capabilities (such as the potential to drastically facilitate bioterrorism) and that once released on the Internet, they cannot be deleted everywhere if needed. They recommend pre-release audits and cost-benefit analyses.
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-11.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-11.md
@ -0,0 +1,29 @@
+---
+title: "Artificial intelligence"
+chunk: 12/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+=== Frameworks ===
+Artificial intelligence projects can be guided by ethical considerations during the design, development, and implementation of an AI system. An AI framework such as the Care and Act Framework, developed by the Alan Turing Institute and based on the SUM values, outlines four main ethical dimensions, defined as follows:
+
+Respect the dignity of individual people
+Connect with other people sincerely, openly, and inclusively
+Care for the wellbeing of everyone
+Protect social values, justice, and the public interest
+Other developments in ethical frameworks include those decided upon during the Asilomar Conference, the Montreal Declaration for Responsible AI, and the IEEE's Ethics of Autonomous Systems initiative, among others; however, these principles are not without criticism, especially regarding the people chosen to contribute to these frameworks.
+Promotion of the wellbeing of the people and communities that these technologies affect requires consideration of the social and ethical implications at all stages of AI system design, development and implementation, and collaboration between job roles such as data scientists, product managers, data engineers, domain experts, and delivery managers.
+The UK AI Safety Institute released in 2024 a testing toolset called 'Inspect' for AI safety evaluations available under an MIT open-source licence which is freely available on GitHub and can be improved with third-party packages. It can be used to evaluate AI models in a range of areas including core knowledge, ability to reason, and autonomous capabilities.
+
+=== Regulation ===
+
+The regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating AI; it is therefore related to the broader regulation of algorithms. The regulatory and policy landscape for AI is an emerging issue in jurisdictions globally. According to AI Index at Stanford, the annual number of AI-related laws passed in the 127 survey countries jumped from one passed in 2016 to 37 passed in 2022 alone. Between 2016 and 2020, more than 30 countries adopted dedicated strategies for AI. Most EU member states had released national AI strategies, as had Canada, China, India, Japan, Mauritius, the Russian Federation, Saudi Arabia, United Arab Emirates, U.S., and Vietnam. Others were in the process of elaborating their own AI strategy, including Bangladesh, Malaysia and Tunisia. The Global Partnership on Artificial Intelligence was launched in June 2020, stating a need for AI to be developed in accordance with human rights and democratic values, to ensure public confidence and trust in the technology. Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher published a joint statement in November 2021 calling for a government commission to regulate AI. In 2023, OpenAI leaders published recommendations for the governance of superintelligence, which they believe may happen in less than 10 years. In 2023, the United Nations also launched an advisory body to provide recommendations on AI governance; the body comprises technology company executives, government officials and academics. On 1 August 2024, the EU Artificial Intelligence Act entered into force, establishing the first comprehensive EU-wide AI regulation. In 2024, the Council of Europe created the first international legally binding treaty on AI, called the "Framework Convention on Artificial Intelligence and Human Rights, Democracy and the Rule of Law". It was adopted by the European Union, the United States, the United Kingdom, and other signatories.
+In a 2022 Ipsos survey, attitudes towards AI varied greatly by country; 78% of Chinese citizens, but only 35% of Americans, agreed that "products and services using AI have more benefits than drawbacks". A 2023 Reuters/Ipsos poll found that 61% of Americans agree, and 22% disagree, that AI poses risks to humanity. In a 2023 Fox News poll, 35% of Americans thought it "very important", and an additional 41% thought it "somewhat important", for the federal government to regulate AI, versus 13% responding "not very important" and 8% responding "not at all important".
+In November 2023, the first global AI Safety Summit was held in Bletchley Park in the UK to discuss the near and far term risks of AI and the possibility of mandatory and voluntary regulatory frameworks. 28 countries including the United States, China, and the European Union issued a declaration at the start of the summit, calling for international co-operation to manage the challenges and risks of artificial intelligence. In May 2024 at the AI Seoul Summit, 16 global AI tech companies agreed to safety commitments on the development of AI.
+In March 2026, the United Nations convened the inaugural meeting of the Independent International Scientific Panel on AI, a 40-member expert body established under the Global Digital Compact to produce annual evidence-based reports on AI's societal impacts.
+
+== History ==
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-12.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-12.md
@ -0,0 +1,25 @@
+---
+title: "Artificial intelligence"
+chunk: 13/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+The study of mechanical or "formal" reasoning began with philosophers and mathematicians in antiquity. The study of logic led directly to Alan Turing's theory of computation, which suggested that a machine, by shuffling symbols as simple as "0" and "1", could simulate any conceivable form of mathematical reasoning. This, along with concurrent discoveries in cybernetics, information theory and neurobiology, led researchers to consider the possibility of building an "electronic brain". They developed several areas of research that would become part of AI, such as McCulloch and Pitts design for "artificial neurons" in 1943, and Turing's influential 1950 paper 'Computing Machinery and Intelligence', which introduced the Turing test and showed that "machine intelligence" was plausible.
+The field of AI research was founded at a workshop at Dartmouth College in 1956. The attendees became the leaders of AI research in the 1960s. They and their students produced programs that the press described as "astonishing": computers were learning checkers strategies, solving word problems in algebra, proving logical theorems and speaking English. Artificial intelligence laboratories were set up at a number of British and U.S. universities in the latter 1950s and early 1960s.
+Researchers in the 1960s and the 1970s were convinced that their methods would eventually succeed in creating a machine with general intelligence and considered this the goal of their field. In 1965 Herbert Simon predicted, "machines will be capable, within twenty years, of doing any work a man can do". In 1967 Marvin Minsky agreed, writing that "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved". They had, however, underestimated the difficulty of the problem. In 1974, both the U.S. and British governments cut off exploratory research in response to the criticism of Sir James Lighthill and ongoing pressure from the U.S. Congress to fund more productive projects. Minsky and Papert's book Perceptrons was understood as proving that artificial neural networks would never be useful for solving real-world tasks, thus discrediting the approach altogether. The "AI winter", a period when obtaining funding for AI projects was difficult, followed.
+In the early 1980s, AI research was revived by the commercial success of expert systems, a form of AI program that simulated the knowledge and analytical skills of human experts. By 1985, the market for AI had reached over a billion dollars. At the same time, Japan's fifth generation computer project inspired the U.S. and British governments to restore funding for academic research. However, beginning with the collapse of the Lisp Machine market in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began.
+Up to this point, most of AI's funding had gone to projects that used high-level symbols to represent mental objects like plans, goals, beliefs, and known facts. In the 1980s, some researchers began to doubt that this approach would be able to imitate all the processes of human cognition, especially perception, robotics, learning and pattern recognition, and began to look into "sub-symbolic" approaches. Rodney Brooks rejected "representation" in general and focussed directly on engineering machines that move and survive. Judea Pearl, Lotfi Zadeh, and others developed methods that handled incomplete and uncertain information by making reasonable guesses rather than precise logic. But the most important development was the revival of "connectionism", including neural network research, by Geoffrey Hinton and others. In 1990, Yann LeCun successfully showed that convolutional neural networks can recognize handwritten digits, the first of many successful applications of neural networks.
+AI gradually restored its reputation in the late 1990s and early 21st century by exploiting formal mathematical methods and by finding specific solutions to specific problems. This "narrow" and "formal" focus allowed researchers to produce verifiable results and collaborate with other fields (such as statistics, economics and mathematics). By 2000, solutions developed by AI researchers were being widely used, although in the 1990s they were rarely described as "artificial intelligence" (a tendency known as the AI effect).
+However, several academic researchers became concerned that AI was no longer pursuing its original goal of creating versatile, fully intelligent machines. Beginning around 2002, they founded the subfield of artificial general intelligence (or "AGI"), which had several well-funded institutions by the 2010s.
+Deep learning began to dominate industry benchmarks in 2012 and was adopted throughout the field.
+For many specific tasks, other methods were abandoned.
+Deep learning's success was based on both hardware improvements (faster computers, graphics processing units, cloud computing) and access to large amounts of data (including curated datasets, such as ImageNet). Deep learning's success led to an enormous increase in interest and funding in AI. The amount of machine learning research (measured by total publications) increased by 50% in the years 2015–2019.
+
+In 2016, issues of fairness and the misuse of technology were catapulted into center stage at machine learning conferences, publications vastly increased, funding became available, and many researchers re-focussed their careers on these issues. The alignment problem became a serious field of academic study.
+In the late 2010s and early 2020s, AGI companies began to deliver programs that created enormous interest. In 2015, AlphaGo, developed by DeepMind, beat the world champion Go player. The program taught only the game's rules and developed a strategy by itself. GPT-3 is a large language model that was released in 2020 by OpenAI and is capable of generating high-quality human-like text. ChatGPT, launched on 30 November 2022, became the fastest-growing consumer software application in history, gaining over 100 million users in two months. It marked what is widely regarded as AI's breakout year, bringing it into the public consciousness. These programs, and others, inspired an aggressive AI boom, where large companies began investing billions of dollars in AI research. According to AI Impacts, about US$50 billion annually was invested in "AI" around 2022 in the U.S. alone and about 20% of the new U.S. Computer Science PhD graduates have specialized in "AI". About 800,000 "AI"-related U.S. job openings existed in 2022. According to PitchBook research, 22% of newly funded startups in 2024 claimed to be AI companies.
+
+== Philosophy ==
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-13.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-13.md
@ -0,0 +1,40 @@
+---
+title: "Artificial intelligence"
+chunk: 14/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Philosophical debates have historically sought to determine the nature of intelligence and how to make intelligent machines. Another major focus has been whether machines can be conscious, and the associated ethical implications. Many other topics in philosophy are relevant to AI, such as epistemology and free will. Rapid advancements have intensified public discussions on the philosophy and ethics of AI.
+
+=== Defining artificial intelligence ===
+
+Alan Turing investigated whether machines can show intelligent behaviour and think. In 1950, he proposed the Turing test, which measures the ability of a machine to simulate human conversation. Since we can only observe the behavior of the machine, it does not matter if it is "actually" thinking or literally has a "mind". Turing notes that we can not determine these things about other people but "it is usual to have a polite convention that everyone thinks."
+
+Russell and Norvig agree with Turing that intelligence must be defined in terms of external behavior, not internal structure. However, they are critical that the test requires the machine to imitate humans. "Aeronautical engineering texts", they wrote, "do not define the goal of their field as making 'machines that fly so exactly like pigeons that they can fool other pigeons.'" AI founder John McCarthy agreed, writing that "Artificial intelligence is not, by definition, simulation of human intelligence".
+McCarthy defines intelligence as "the computational part of the ability to achieve goals in the world". Another AI founder, Marvin Minsky, similarly describes it as "the ability to solve hard problems". Artificial Intelligence: A Modern Approach defines it as the study of agents that perceive their environment and take actions that maximize their chances of achieving defined goals. 
+The many differing definitions of AI have been critically analyzed. During the 2020s AI boom, the term has been used as a marketing buzzword to promote products and services which do not use AI.
+
+==== Legal definitions ====
+The International Organization for Standardization describes an AI system as a "an engineered system that generates outputs such as content, forecasts, recommendations, or decisions for a given set of human‑defined objectives, and can operate with varying levels of automation". The EU AI Act defines an AI system as "a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments". In the United States, influential but non‑binding guidance such as the National Institute of Standards and Technology's AI Risk Management Framework describes an AI system as "an engineered or machine-based system that can, for a given set of objectives, generate outputs such as predictions, recommendations, or decisions influencing real or virtual environments. AI systems are designed to operate with varying levels of autonomy".
+
+=== Evaluating approaches to AI ===
+No established unifying theory or paradigm has guided AI research for most of its history. The unprecedented success of statistical machine learning in the 2010s eclipsed all other approaches (so much so that some sources, especially in the business world, use the term "artificial intelligence" to mean "machine learning with neural networks"). This approach is mostly sub-symbolic, soft and narrow. Critics argue that these questions may have to be revisited by future generations of AI researchers.
+
+==== Symbolic AI and its limits ====
+Symbolic AI (or "GOFAI") simulated the high-level conscious reasoning that people use when they solve puzzles, express legal reasoning and do mathematics. They were highly successful at "intelligent" tasks such as algebra or IQ tests. In the 1960s, Newell and Simon proposed the physical symbol systems hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action."
+However, the symbolic approach failed on many tasks that humans solve easily, such as learning, recognizing an object or commonsense reasoning. Moravec's paradox is the discovery that high-level "intelligent" tasks were easy for AI, but low level "instinctive" tasks were extremely difficult. Philosopher Hubert Dreyfus had argued since the 1960s that human expertise depends on unconscious instinct rather than conscious symbol manipulation, and on having a "feel" for the situation, rather than explicit symbolic knowledge. Although his arguments had been ridiculed and ignored when they were first presented, eventually, AI research came to agree with him.
+The issue is not resolved: sub-symbolic reasoning can make many of the same inscrutable mistakes that human intuition does, such as algorithmic bias. Critics such as Noam Chomsky argue continuing research into symbolic AI will still be necessary to attain general intelligence, in part because sub-symbolic AI is a move away from explainable AI: it can be difficult or impossible to understand why a modern statistical AI program made a particular decision. The emerging field of neuro-symbolic artificial intelligence attempts to bridge the two approaches.
+
+==== Neat vs. scruffy ====
+
+"Neats" hope that intelligent behavior is described using simple, elegant principles (such as logic, optimization, or neural networks). "Scruffies" expect that it necessarily requires solving a large number of unrelated problems. Neats defend their programs with theoretical rigor, scruffies rely mainly on incremental testing to see if they work. This issue was actively discussed in the 1970s and 1980s, but eventually was seen as irrelevant. Modern AI has elements of both.
+
+==== Soft vs. hard computing ====
+
+Finding a provably correct or optimal solution is intractable for many important problems. Soft computing is a set of techniques, including genetic algorithms, fuzzy logic and neural networks, that are tolerant of imprecision, uncertainty, partial truth and approximation. Soft computing was introduced in the late 1980s and most successful AI programs in the 21st century are examples of soft computing with neural networks.
+
+==== Narrow vs. general AI ====
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-14.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-14.md
@ -0,0 +1,43 @@
+---
+title: "Artificial intelligence"
+chunk: 15/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+AI researchers are divided as to whether to pursue the goals of artificial general intelligence and superintelligence directly or to solve as many specific problems as possible (narrow AI) in hopes these solutions will lead indirectly to the field's long-term goals. General intelligence is difficult to define and difficult to measure, and modern AI has had more verifiable successes by focusing on specific problems with specific solutions. The sub-field of artificial general intelligence studies this area exclusively.
+
+=== Machine consciousness, sentience, and mind ===
+
+There is no settled consensus in philosophy of mind on whether a machine can have a mind, consciousness and mental states in the same sense that human beings do. This issue considers the internal experiences of the machine, rather than its external behavior. Mainstream AI research considers this issue irrelevant because it does not affect the goals of the field: to build machines that can solve problems using intelligence. Russell and Norvig add that "[t]he additional project of making a machine conscious in exactly the way humans are is not one that we are equipped to take on." However, the question has become central to the philosophy of mind. It is also typically the central question at issue in artificial intelligence in fiction.
+
+==== Consciousness ====
+
+David Chalmers identified two problems in understanding the mind, which he named the "hard" and "easy" problems of consciousness. The easy problem is understanding how the brain processes signals, makes plans and controls behavior. The hard problem is explaining how this feels or why it should feel like anything at all, assuming we are right in thinking that it truly does feel like something (Dennett's consciousness illusionism says this is an illusion). While human information processing is easy to explain, human subjective experience is difficult to explain. For example, it is easy to imagine a color-blind person who has learned to identify which objects in their field of view are red, but it is not clear what would be required for the person to know what red looks like.
+
+==== Computationalism and functionalism ====
+
+Computationalism is the position in the philosophy of mind that the human mind is an information processing system and that thinking is a form of computing. Computationalism argues that the relationship between mind and body is similar or identical to the relationship between software and hardware and thus may be a solution to the mind–body problem. This philosophical position was inspired by the work of AI researchers and cognitive scientists in the 1960s and was originally proposed by philosophers Jerry Fodor and Hilary Putnam.
+Philosopher John Searle characterized this position as "strong AI": "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds." Searle challenges this claim with his Chinese room argument, which attempts to show that even a computer capable of perfectly simulating human behavior would not have a mind.
+
+==== AI welfare and rights ====
+
+It is difficult or impossible to reliably evaluate whether an advanced AI is sentient (has the ability to feel), and if so, to what degree. But if there is a significant chance that a given machine can feel and suffer, then it may be entitled to certain rights or welfare protection measures, similarly to animals. Sapience (a set of capacities related to high intelligence, such as discernment or self-awareness) may provide another moral basis for AI rights. Robot rights are also sometimes proposed as a practical way to integrate autonomous agents into society.
+In 2017, the European Union considered granting "electronic personhood" to some of the most capable AI systems. Similarly to the legal status of companies, it would have conferred rights but also responsibilities. Critics argued in 2018 that granting rights to AI systems would downplay the importance of human rights, and that legislation should focus on user needs rather than speculative futuristic scenarios. They also noted that robots lacked the autonomy to take part in society on their own.
+Progress in AI increased interest in the topic. Proponents of AI welfare and rights often argue that AI sentience, if it emerges, would be particularly easy to deny. They warn that this may be a moral blind spot analogous to slavery or factory farming, which could lead to large-scale suffering if sentient AI is created and carelessly exploited.
+
+== Future ==
+
+=== Superintelligence and the singularity ===
+A superintelligence is a hypothetical agent that would possess intelligence far surpassing that of the brightest and most gifted human mind. If research into artificial general intelligence produced sufficiently intelligent software, it might be able to reprogram and improve itself. The improved software would be even better at improving itself, leading to what I. J. Good called an "intelligence explosion" and Vernor Vinge called a "singularity".
+However, technologies cannot improve exponentially indefinitely, and typically follow an S-shaped curve, slowing when they reach the physical limits of what the technology can do.
+
+=== Transhumanism ===
+
+Robot designer Hans Moravec, cyberneticist Kevin Warwick and inventor Ray Kurzweil have predicted that humans and machines may merge in the future into cyborgs that are more capable and powerful than either. This idea, called transhumanism, has roots in the writings of Aldous Huxley and Robert Ettinger.
+Edward Fredkin argues that "artificial intelligence is the next step in evolution", an idea first proposed by Samuel Butler's "Darwin among the Machines" as far back as 1863, and expanded upon by George Dyson in his 1998 book Darwin Among the Machines: The Evolution of Global Intelligence.
+
+== In fiction ==
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-15.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-15.md
@ -0,0 +1,30 @@
+---
+title: "Artificial intelligence"
+chunk: 16/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Thought-capable artificial beings have appeared as storytelling devices since antiquity, and have been a persistent theme in science fiction.
+A common trope in these works began with Mary Shelley's Frankenstein, where a human creation becomes a threat to its masters. This includes such works as Arthur C. Clarke's and Stanley Kubrick's 2001: A Space Odyssey (both 1968), with HAL 9000, the murderous computer in charge of the Discovery One spaceship, as well as Blade Runner (1982), The Terminator (1984) and The Matrix (1999). In contrast, the rare loyal robots such as Gort from The Day the Earth Stood Still (1951) and Bishop from Aliens (1986) are less prominent in popular culture.
+Isaac Asimov introduced the Three Laws of Robotics in many stories, most notably with the "Multivac" super-intelligent computer. Asimov's laws are often brought up during lay discussions of machine ethics; while almost all artificial intelligence researchers are familiar with Asimov's laws through popular culture, they generally consider the laws useless for many reasons, one of which is their ambiguity.
+Several works use AI to force us to confront the fundamental question of what makes us human, showing us artificial beings that have the ability to feel, and thus to suffer. This appears in Karel Čapek's R.U.R., the films A.I. Artificial Intelligence and Ex Machina, as well as the novel Do Androids Dream of Electric Sheep?, by Philip K. Dick. Dick considers the idea that our understanding of human subjectivity is altered by technology created with artificial intelligence.
+
+== See also ==
+
+== Explanatory notes ==
+
+== References ==
+
+=== Textbooks ===
+
+=== History of AI ===
+
+=== Other sources ===
+
+== External links ==
+
+Hauser, Larry. "Artificial Intelligence". In Fieser, James; Dowden, Bradley (eds.). Internet Encyclopedia of Philosophy. ISSN 2161-0002. OCLC 37741658.
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-2.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-2.md
@ -0,0 +1,50 @@
+---
+title: "Artificial intelligence"
+chunk: 3/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+==== State space search ====
+State space search searches through a tree of possible states to try to find a goal state. For example, planning algorithms search through trees of goals and subgoals, attempting to find a path to a target goal, a process called means-ends analysis.
+Simple exhaustive searches are rarely sufficient for most real-world problems: the search space (the number of places to search) quickly grows to astronomical numbers. The result is a search that is too slow or never completes. "Heuristics" or "rules of thumb" can help prioritize choices that are more likely to reach a goal.
+Adversarial search is used for game-playing programs, such as chess or Go. It searches through a tree of possible moves and countermoves, looking for a winning position.
+
+==== Local search ====
+
+Local search uses mathematical optimization to find a solution to a problem. It begins with some form of guess and refines it incrementally.
+Gradient descent is a type of local search that optimizes a set of numerical parameters by incrementally adjusting them to minimize a loss function. Variants of gradient descent are commonly used to train neural networks, through the backpropagation algorithm.
+Another type of local search is evolutionary computation, which aims to iteratively improve a set of candidate solutions by "mutating" and "recombining" them, selecting only the fittest to survive each generation.
+Distributed search processes can coordinate via swarm intelligence algorithms. Two popular swarm algorithms used in search are particle swarm optimization (inspired by bird flocking) and ant colony optimization (inspired by ant trails).
+
+=== Logic ===
+Formal logic is used for reasoning and knowledge representation.
+Formal logic comes in two main forms: propositional logic (which operates on statements that are true or false and uses logical connectives such as "and", "or", "not" and "implies") and predicate logic (which also operates on objects, predicates and relations and uses quantifiers such as "Every X is a Y" and "There are some Xs that are Ys").
+Deductive reasoning in logic is the process of proving a new statement (conclusion) from other statements that are given and assumed to be true (the premises). Proofs can be structured as proof trees, in which nodes are labelled by sentences, and children nodes are connected to parent nodes by inference rules.
+Given a problem and a set of premises, problem-solving reduces to searching for a proof tree whose root node is labelled by a solution of the problem and whose leaf nodes are labelled by premises or axioms. In the case of Horn clauses, problem-solving search can be performed by reasoning forwards from the premises or backwards from the problem. In the more general case of the clausal form of first-order logic, resolution is a single, axiom-free rule of inference, in which a problem is solved by proving a contradiction from premises that include the negation of the problem to be solved.
+Inference in both Horn clause logic and first-order logic is undecidable, and therefore intractable. However, backward reasoning with Horn clauses, which underpins computation in the logic programming language Prolog, is Turing complete. Moreover, its efficiency is competitive with computation in other symbolic programming languages.
+Fuzzy logic assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.
+Non-monotonic logics, including logic programming with negation as failure, are designed to handle default reasoning. Other specialized versions of logic have been developed to describe many complex domains.
+
+==== DPLL algorithm ====
+
+==== Backward chaining ====
+
+==== Forward chaining ====
+
+=== Probabilistic methods for uncertain reasoning ===
+
+Many problems in AI (including reasoning, planning, learning, perception, and robotics) require the agent to operate with incomplete or uncertain information. AI researchers have devised a number of tools to solve these problems using methods from probability theory and economics. Precise mathematical tools have been developed that analyze how an agent can make choices and plan, using decision theory, decision analysis, and information value theory. These tools include models such as Markov decision processes, dynamic decision networks, game theory and mechanism design.
+Bayesian networks are a tool that can be used for reasoning (using the Bayesian inference algorithm), learning (using the expectation–maximization algorithm), planning (using decision networks) and perception (using dynamic Bayesian networks).
+Probabilistic algorithms can also be used for filtering, prediction, smoothing, and finding explanations for streams of data, thus helping perception systems analyze processes that occur over time (e.g., hidden Markov models or Kalman filters).
+
+=== Classifiers and statistical learning methods ===
+The simplest AI applications can be divided into two types: classifiers (e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if diamond then pick up"), on the other hand. Classifiers are functions that use pattern matching to determine the closest match. They can be fine-tuned based on chosen examples using supervised learning. Each pattern (also called an "observation") is labeled with a certain predefined class. All the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience.
+There are many kinds of classifiers in use. The decision tree is the simplest and most widely used symbolic machine learning algorithm. K-nearest neighbor algorithm was the most widely used analogical AI until the mid-1990s, and Kernel methods such as the support vector machine (SVM) displaced k-nearest neighbor in the 1990s.
+The naive Bayes classifier is reportedly the "most widely used learner" at Google, due in part to its scalability.
+Neural networks are also used as classifiers.
+
+=== Artificial neural networks ===
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-3.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-3.md
@ -0,0 +1,49 @@
+---
+title: "Artificial intelligence"
+chunk: 4/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+An artificial neural network is based on a collection of nodes also known as artificial neurons, which loosely model the neurons in a biological brain. It is trained to recognise patterns; once trained, it can recognise those patterns in fresh data. There is an input, at least one hidden layer of nodes and an output. Each node applies a function and once the weight crosses its specified threshold, the data is transmitted to the next layer. A network is typically called a deep neural network if it has at least 2 hidden layers.
+Learning algorithms for neural networks use local search to choose the weights that will get the right output for each input during training. The most common training technique is the backpropagation algorithm. Neural networks learn to model complex relationships between inputs and outputs and find patterns in data. In theory, a neural network can learn any function.
+In feedforward neural networks the signal passes in only one direction. The term perceptron typically refers to a single-layer neural network. In contrast, deep learning uses many layers. Recurrent neural networks (RNNs) feed the output signal back into the input, which allows short-term memories of previous input events. Long short-term memory networks (LSTMs) are recurrent neural networks that better preserve longterm dependencies and are less sensitive to the vanishing gradient problem. Convolutional neural networks (CNNs) use layers of kernels to more efficiently process local patterns. This local processing is especially important in image processing, where the early CNN layers typically identify simple local patterns such as edges and curves, with subsequent layers detecting more complex patterns like textures, and eventually whole objects.
+
+=== Deep learning ===
+
+Deep learning uses several layers of neurons between the network's inputs and outputs. The multiple layers can progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits, letters, or faces.
+Deep learning has profoundly improved the performance of programs in many important subfields of artificial intelligence, including computer vision, speech recognition, natural language processing, image classification, and others. The reason that deep learning performs so well in so many applications is not known as of 2021. The sudden success of deep learning in 2012–2015 did not occur because of some new discovery or theoretical breakthrough (deep neural networks and backpropagation had been described by many people, as far back as the 1950s) but because of two factors: the incredible increase in computer power (including the hundred-fold increase in speed by switching to GPUs) and the availability of vast amounts of training data, especially the giant curated datasets used for benchmark testing, such as ImageNet.
+
+=== GPT ===
+Generative pre-trained transformers (GPT) are large language models (LLMs) that generate text based on the semantic relationships between words in sentences. Text-based GPT models are pre-trained on a large corpus of text that can be from the Internet. The pretraining consists of predicting the next token (a token being usually a word, subword, or punctuation). Throughout this pretraining, GPT models accumulate knowledge about the world and can then generate human-like text by repeatedly predicting the next token. Typically, a subsequent training phase makes the model more truthful, useful, and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are prone to generating falsehoods called "hallucinations". These can be reduced with RLHF and quality data, but the problem has been getting worse for reasoning systems. Such systems are used in chatbots, which allow people to ask a question or request a task in simple text.
+Current models and services include ChatGPT, Claude, Gemini, Copilot, and Meta AI. Multimodal GPT models can process different types of data (modalities) such as images, videos, sound, and text.
+
+=== Hardware and software ===
+
+In the late 2010s, graphics processing units (GPUs) that were increasingly designed with AI-specific enhancements and used with specialized TensorFlow software had replaced previously used central processing unit (CPUs) as the dominant means for large-scale (commercial and academic) machine learning models' training. Specialized programming languages such as Prolog were used in early AI research, but general-purpose programming languages like Python have become predominant.
+The transistor density in integrated circuits has been observed to roughly double every 18 months—a trend known as Moore's law, named after the Intel co-founder Gordon Moore, who first identified it. Improvements in GPUs have been even faster, a trend sometimes called Huang's law, named after Nvidia co-founder and CEO Jensen Huang.
+
+== Applications ==
+
+AI and machine learning technology is used in most of the essential applications of the 2020s, including:
+
+search engines (such as Google Search)
+targeting online advertisements
+recommendation systems (offered by Netflix, YouTube or Amazon) driving internet traffic
+targeted advertising (AdSense, Facebook)
+virtual assistants (such as Siri or Alexa)
+autonomous vehicles (including drones, ADAS and self-driving cars)
+automatic language translation (Microsoft Translator, Google Translate)
+facial recognition (Apple's FaceID or Microsoft's DeepFace and Google's FaceNet)
+image labeling (used by Facebook, Apple's Photos and TikTok).
+The deployment of AI may be overseen by a chief automation officer (CAO).
+
+=== Health and medicine ===
+
+It has been suggested that AI can overcome discrepancies in funding allocated to different fields of research.
+AlphaFold 2 (2021) demonstrated the ability to approximate, in hours rather than months, the 3D structure of a protein. In 2023, it was reported that AI-guided drug discovery helped find a class of antibiotics capable of killing two different types of drug-resistant bacteria. In 2024, researchers used machine learning to accelerate the search for Parkinson's disease drug treatments. Their aim was to identify compounds that block the clumping, or aggregation, of alpha-synuclein (the protein that characterises Parkinson's disease). They were able to speed up the initial screening process ten-fold and reduce the cost by a thousand-fold.
+
+=== Gaming ===
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-4.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-4.md
@ -0,0 +1,51 @@
+---
+title: "Artificial intelligence"
+chunk: 5/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Game playing programs have been used since the 1950s to demonstrate and test AI's most advanced techniques. Deep Blue became the first computer chess-playing system to beat a reigning world chess champion, Garry Kasparov, on 11 May 1997. In 2011, in a Jeopardy! quiz show exhibition match, IBM's question answering system, Watson, defeated the two greatest Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin. In March 2016, AlphaGo won 4 out of 5 games of Go in a match with Go champion Lee Sedol, becoming the first computer Go-playing system to beat a professional Go player without handicaps. Then, in 2017, it defeated Ke Jie, who was the best Go player in the world. Other programs handle imperfect-information games, such as the poker-playing program Pluribus. DeepMind developed increasingly generalistic reinforcement learning models, such as with MuZero, which could be trained to play chess, Go, or Atari games. In 2019, DeepMind's AlphaStar achieved grandmaster level in StarCraft II, a particularly challenging real-time strategy game that involves incomplete knowledge of what happens on the map. In 2021, an AI agent competed in a PlayStation Gran Turismo competition, winning against four of the world's best Gran Turismo drivers using deep reinforcement learning. In 2024, Google DeepMind introduced SIMA, a type of AI capable of autonomously playing nine previously unseen open-world video games by observing screen output, as well as executing short, specific tasks in response to natural language instructions.
+
+=== Mathematics ===
+In mathematics, probabilistic large language models are versatile, but can also produce wrong answers in the form of hallucinations. The Alibaba Group developed a version of its Qwen models called Qwen2-Math, that achieved state-of-the-art performance on several mathematical benchmarks, including 84% accuracy on the MATH dataset of competition mathematics problems. In January 2025, Microsoft proposed the technique rStar-Math that leverages Monte Carlo tree search and step-by-step reasoning, enabling a relatively small language model like Qwen-7B to solve 53% of the AIME 2024 and 90% of the MATH benchmark problems. Google DeepMind has developed models for solving mathematical problems: AlphaTensor, AlphaGeometry, AlphaProof and AlphaEvolve.
+When natural language is used to describe mathematical problems, converters can transform such prompts into a formal language such as Lean to define mathematical tasks. The experimental model Gemini Deep Think accepts natural language prompts directly and achieved gold medal results in the International Math Olympiad of 2025.
+Topological deep learning integrates various topological approaches.
+
+=== Finance ===
+According to Nicolas Firzli, director of the World Pensions & Investments Forum, it may be too early to see the emergence of highly innovative AI-informed financial products and services. He argues that "the deployment of AI tools will simply further automatise things: destroying tens of thousands of jobs in banking, financial planning, and pension advice in the process, but I'm not sure it will unleash a new wave of [e.g., sophisticated] pension innovation."
+
+=== Military ===
+
+Various countries are deploying AI military applications. The main applications enhance command and control, communications, sensors, integration and interoperability. Research is targeting intelligence collection and analysis, logistics, cyber operations, information operations, and semiautonomous and autonomous vehicles. AI technologies enable coordination of sensors and effectors, threat detection and identification, marking of enemy positions, target acquisition, coordination and deconfliction of distributed Joint Fires between networked combat vehicles, both human-operated and autonomous.
+AI has been used in military operations in Iraq, Syria, Israel and Ukraine.
+
+=== Generative AI ===
+
+=== Agents ===
+
+AI agents are software entities designed to perceive their environment, make decisions, and take actions autonomously to achieve specific goals. These agents can interact with users, their environment, or other agents. AI agents are used in various applications, including virtual assistants, chatbots, autonomous vehicles, game-playing systems, and industrial robotics. AI agents operate within the constraints of their programming, available computational resources, and hardware limitations. This means they are restricted to performing tasks within their defined scope and have finite memory and processing capabilities. In real-world applications, AI agents often face time constraints for decision-making and action execution. Many AI agents incorporate learning algorithms, enabling them to improve their performance over time through experience or training. Using machine learning, AI agents can adapt to new situations and optimise their behaviour for their designated tasks.
+
+=== Web search ===
+Microsoft introduced Copilot Search in February 2023 under the name Bing Chat. Copilot Search provides AI-generated summaries.
+Google introduced an AI Mode at its Google I/O event on 20 May 2025.
+
+=== Sexuality ===
+Applications of AI in this domain include AI-enabled menstruation and fertility trackers that analyze user data to offer predictions, AI-integrated sex toys (e.g., teledildonics), AI-generated sexual education content, and AI agents that simulate sexual and romantic partners (e.g., Replika). AI is also used for the production of non-consensual deepfake pornography, raising significant ethical and legal concerns.
+AI technologies have also been used to attempt to identify online gender-based violence and online sexual grooming of minors.
+
+=== Other industry-specific tasks ===
+In a 2017 survey, one in five companies reported having incorporated "AI" in some offerings or processes. 
+In the field of evacuation and disaster management, AI has been used to investigate patterns in large-scale and small-scale evacuations using historical data from GPS, videos or social media.
+During the 2024 Indian elections, US$50 million was spent on authorized AI-generated content, notably by creating deepfakes of allied (including sometimes deceased) politicians to better engage with voters, and by translating speeches to various local languages.
+
+== Ethics ==
+
+AI has potential benefits and potential risks. AI may be able to advance science and find solutions for serious problems: Demis Hassabis of DeepMind hopes to "solve intelligence, and then use that to solve everything else". However, as the use of AI has become widespread, several unintended consequences and risks have been identified. In-production systems can sometimes not factor ethics and bias into their AI training processes, especially when the AI algorithms are inherently unexplainable in deep learning.
+
+=== Risks and harm ===
+
+==== Privacy and copyright ====
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-5.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-5.md
@ -0,0 +1,20 @@
+---
+title: "Artificial intelligence"
+chunk: 6/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Machine learning algorithms require large amounts of data. The techniques used to acquire this data have raised concerns about privacy, surveillance and copyright.
+AI-powered devices and services, such as virtual assistants and IoT products, continuously collect personal information, raising concerns about intrusive data gathering and unauthorized access by third parties. The loss of privacy is further exacerbated by AI's ability to process and combine vast amounts of data, potentially leading to a surveillance society where individual activities are constantly monitored and analyzed without adequate safeguards or transparency.
+Sensitive user data collected may include online activity records, geolocation data, video, or audio. For example, in order to build speech recognition algorithms, Amazon has recorded millions of private conversations and allowed temporary workers to listen to and transcribe some of them. Opinions about this widespread surveillance range from those who see it as a necessary evil to those for whom it is clearly unethical and a violation of the right to privacy.
+AI developers argue that this is the only way to deliver valuable applications and have developed several techniques that attempt to preserve privacy while still obtaining the data, such as data aggregation, de-identification and differential privacy. Since 2016, some privacy experts, such as Cynthia Dwork, have begun to view privacy in terms of fairness. Brian Christian wrote that experts have pivoted "from the question of 'what they know' to the question of 'what they're doing with it'."
+Generative AI is often trained on unlicensed copyrighted works, including in domains such as images or computer code; the output is then used under the rationale of "fair use". Experts disagree about how well and under what circumstances this rationale will hold up in courts of law; relevant factors may include "the purpose and character of the use of the copyrighted work" and "the effect upon the potential market for the copyrighted work". Website owners can indicate that they do not want their content scraped via a "robots.txt" file. However, some companies will scrape content regardless because the robots.txt file has no real authority. In 2023, leading authors (including John Grisham and Jonathan Franzen) sued AI companies for using their work to train generative AI. Another discussed approach is to envision a separate sui generis system of protection for creations generated by AI to ensure fair attribution and compensation for human authors.
+
+==== Dominance by tech giants ====
+The commercial AI scene is dominated by Big Tech companies such as Alphabet Inc., Amazon, Apple Inc., Meta Platforms, and Microsoft. Some of these players already own the vast majority of existing cloud infrastructure and computing power from data centers, allowing them to entrench further in the marketplace.
+
+==== Power needs and environmental impacts ====
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-6.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-6.md
@ -0,0 +1,24 @@
+---
+title: "Artificial intelligence"
+chunk: 7/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Technology companies have built electricity and artificial intelligence infrastructure to facilitate the AI boom of the 2020s. A 2025 report from the consulting firm McKinsey & Company estimated that by 2030, $2.7 trillion would be invested into AI infrastructure and data centers in the US, surpassing World War II's Manhattan Project every month.
+In January 2024, the International Energy Agency (IEA) released Electricity 2024, Analysis and Forecast to 2026. This is the first IEA report to make projections for data centers and power consumption by AI and cryptocurrency. The report states that power demand for these uses might double by 2026, with the additional power consumption equaling that of Japan.
+Power consumption by AI is responsible for an increase in fossil fuel use, and has delayed closings of obsolete, carbon-emitting coal energy facilities. A ChatGPT search involves the use of 10 times the electrical energy as a Google search.
+A 2024 Goldman Sachs Research Paper, AI Data Centers and the Coming US Power Demand Surge, found "US power demand (is) likely to experience growth not seen in a generation...." and forecasts that, by 2030, US data centers will consume 8% of US power, as opposed to 3% in 2022, presaging growth for the electrical power generation industry by a variety of means. Data centers' need for more and more electrical power is such that they might max out the electrical grid. The Big Tech companies counter that AI can be used to maximize the utilization of the grid by all.
+In 2024, The Wall Street Journal reported that big AI companies have begun negotiations with the US nuclear power providers to provide electricity to the data centers. In March 2024 Amazon purchased a Pennsylvania nuclear-powered data center for US$650 million.
+In September 2024, Microsoft announced an agreement with Constellation Energy to re-open the Three Mile Island nuclear power plant to provide Microsoft with 100% of all electric power produced by the plant for 20 years. Reopening the plant, which suffered a partial nuclear meltdown of its Unit 2 reactor in 1979, will require Constellation to get through strict regulatory processes which will include extensive safety scrutiny from the US Nuclear Regulatory Commission. If approved (this will be the first ever US re-commissioning of a nuclear plant), over 835 megawatts of power – enough for 800,000 homes – of energy will be produced. The cost for re-opening and upgrading is estimated at US$1.6 billion and is dependent on tax breaks for nuclear power contained in the 2022 US Inflation Reduction Act. As of 2024, the US government and the state of Michigan have been investing almost US$2 billion to reopen the Palisades Nuclear reactor on Lake Michigan. Closed since 2022, the plant was planned to be reopened in October 2025.
+After the last approval in September 2023, Taiwan suspended the approval of data centers north of Taoyuan with a capacity of more than 5 MW in 2024, due to power supply shortages. Taiwan aims to phase out nuclear power by 2025. 
+Singapore imposed a ban on the opening of data centers in 2019 due to electric power, but in 2022, lifted this ban.
+Although most nuclear plants in Japan have been shut down after the 2011 Fukushima nuclear accident, according to an October 2024 Bloomberg article in Japanese, cloud gaming services company Ubitus, in which Nvidia has a stake, is looking for land in Japan near a nuclear power plant for a new data center for generative AI.
+On 1 November 2024, the Federal Energy Regulatory Commission (FERC) rejected an application submitted by Talen Energy for approval to supply some electricity from the nuclear power station Susquehanna to Amazon's data center.
+According to the Commission Chairman Willie L. Phillips, it is a burden on the electricity grid as well as a significant cost shifting concern to households and other business sectors.
+In 2025, a report prepared by the IEA estimated the greenhouse gas emissions from the energy consumption of AI at 180 million tons. By 2035, these emissions could rise to 300–500 million tonnes depending on what measures will be taken. This is below 1.5% of the energy sector emissions. The emissions reduction potential of AI was estimated at 5% of the energy sector emissions, but rebound effects (for example if people switch from public transport to autonomous cars) can reduce it.
+
+==== Misinformation ====
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-7.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-7.md
@ -0,0 +1,24 @@
+---
+title: "Artificial intelligence"
+chunk: 8/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+YouTube, Facebook and others use recommender systems to guide users to more content. These AI programs were given the goal of maximizing user engagement (that is, the only goal was to keep people watching). The AI learned that users tended to choose misinformation, conspiracy theories, and extreme partisan content, and, to keep them watching, the AI recommended more of it. Users also tended to watch more content on the same subject, so the AI led people into filter bubbles where they received multiple versions of the same misinformation. This convinced many users that the misinformation was true, and ultimately undermined trust in institutions, the media and the government. The AI program had correctly learned to maximize its goal, but the result was harmful to society. After the U.S. election in 2016, major technology companies took some steps to mitigate the problem.
+In the early 2020s, generative AI began to create images, audio, and texts that are virtually indistinguishable from real photographs, recordings, or human writing, while realistic AI-generated videos became feasible in the mid-2020s. It is possible for bad actors to use this technology to create massive amounts of misinformation or propaganda; one such potential malicious use is deepfakes for computational propaganda. AI pioneer and Nobel Prize-winning computer scientist Geoffrey Hinton expressed concern about AI enabling "authoritarian leaders to manipulate their electorates" on a large scale, among other risks. The ability to influence electorates has been proved in at least one study. This same study shows more inaccurate statements from the models when they advocate for candidates of the political right.
+AI researchers at Microsoft, OpenAI, universities and other organisations have suggested using "personhood credentials" as a way to overcome online deception enabled by AI models.
+
+==== Algorithmic bias and fairness ====
+
+Machine learning applications can be biased if they learn from biased data. The developers may not be aware that the bias exists. Discriminatory behavior by some LLMs can be observed in their output. Bias can be introduced by the way training data is selected and by the way a model is deployed. If a biased algorithm is used to make decisions that can seriously harm people (as it can in medicine, finance, recruitment, housing or policing) then the algorithm may cause discrimination. The field of fairness studies how to prevent harms from algorithmic biases.
+On 28 June 2015, Google Photos's new image labeling feature mistakenly identified Jacky Alcine and a friend as "gorillas" because they were black. The system was trained on a dataset that contained very few images of black people, a problem called "sample size disparity". Google "fixed" this problem by preventing the system from labelling anything as a "gorilla". Eight years later, in 2023, Google Photos still could not identify a gorilla, and neither could similar products from Apple, Facebook, Microsoft and Amazon.
+COMPAS is a commercial program widely used by U.S. courts to assess the likelihood of a defendant becoming a recidivist. In 2016, Julia Angwin at ProPublica discovered that COMPAS exhibited racial bias, despite the fact that the program was not told the races of the defendants. Although the error rate for both whites and blacks was calibrated equal at exactly 61%, the errors for each race were different—the system consistently overestimated the chance that a black person would re-offend and would underestimate the chance that a white person would not re-offend. In 2017, several researchers showed that it was mathematically impossible for COMPAS to accommodate all possible measures of fairness when the base rates of re-offense were different for whites and blacks in the data.
+A program can make biased decisions even if the data does not explicitly mention a problematic feature (such as "race" or "gender"). The feature will correlate with other features (like "address", "shopping history" or "first name"), and the program will make the same decisions based on these features as it would on "race" or "gender". Moritz Hardt said "the most robust fact in this research area is that fairness through blindness doesn't work."
+Criticism of COMPAS highlighted that machine learning models are designed to make "predictions" that are only valid if we assume that the future will resemble the past. If they are trained on data that includes the results of racist decisions in the past, machine learning models must predict that racist decisions will be made in the future. If an application then uses these predictions as recommendations, some of these "recommendations" will likely be racist. Thus, machine learning is not well suited to help make decisions in areas where there is hope that the future will be better than the past. It is descriptive rather than prescriptive.
+Bias and unfairness may go undetected because the developers are overwhelmingly white and male: among AI engineers, about 4% are black and 20% are women.
+There are various conflicting definitions and mathematical models of fairness. These notions depend on ethical assumptions, and are influenced by beliefs about society. One broad category is distributive fairness, which focuses on the outcomes, often identifying groups and seeking to compensate for statistical disparities. Representational fairness tries to ensure that AI systems do not reinforce negative stereotypes or render certain groups invisible. Procedural fairness focuses on the decision process rather than the outcome. The most relevant notions of fairness may depend on the context, notably the type of AI application and the stakeholders. The subjectivity in the notions of bias and fairness makes it difficult for companies to operationalize them. Having access to sensitive attributes such as race or gender is also considered by many AI ethicists to be necessary in order to compensate for biases, but it may conflict with anti-discrimination laws.
+At the 2022 ACM Conference on Fairness, Accountability, and Transparency a paper reported that a CLIP‑based (Contrastive Language-Image Pre-training) robotic system reproduced harmful gender‑ and race‑linked stereotypes in a simulated manipulation task. The authors recommended robot‑learning methods which physically manifest such harms be "paused, reworked, or even wound down when appropriate, until outcomes can be proven safe, effective, and just."
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-8.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-8.md
@ -0,0 +1,26 @@
+---
+title: "Artificial intelligence"
+chunk: 9/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+==== Lack of transparency ====
+
+Many AI systems are so complex that their designers cannot explain how they reach their decisions. Particularly with deep neural networks, in which there are many non-linear relationships between inputs and outputs. But some popular explainability techniques exist.
+It is impossible to be certain that a program is operating correctly if no one knows how exactly it works. There have been many cases where a machine learning program passed rigorous tests, but nevertheless learned something different than what the programmers intended. For example, a system that could identify skin diseases better than medical professionals was found to actually have a strong tendency to classify images with a ruler as "cancerous", because pictures of malignancies typically include a ruler to show the scale. Another machine learning system designed to help effectively allocate medical resources was found to classify patients with asthma as being at "low risk" of dying from pneumonia. Having asthma is actually a severe risk factor, but since the patients having asthma would usually get much more medical care, they were relatively unlikely to die according to the training data. The correlation between asthma and low risk of dying from pneumonia was real, but misleading.
+People who have been harmed by an algorithm's decision have a right to an explanation. Doctors, for example, are expected to clearly and completely explain to their colleagues the reasoning behind any decision they make. Early drafts of the European Union's General Data Protection Regulation in 2016 included an explicit statement that this right exists. Industry experts noted that this is an unsolved problem with no solution in sight. Regulators argued that nevertheless the harm is real: if the problem has no solution, the tools should not be used.
+DARPA established the XAI ("Explainable Artificial Intelligence") program in 2014 to try to solve these problems.
+Several approaches aim to address the transparency problem. SHAP enables to visualise the contribution of each feature to the output. LIME can locally approximate a model's outputs with a simpler, interpretable model. Multitask learning provides a large number of outputs in addition to the target classification. These other outputs can help developers deduce what the network has learned. Deconvolution, DeepDream and other generative methods can allow developers to see what different layers of a deep network for computer vision have learned, and produce output that can suggest what the network is learning. For generative pre-trained transformers, Anthropic developed a technique based on dictionary learning that associates patterns of neuron activations with human-understandable concepts.
+
+==== Bad actors and weaponized AI ====
+
+Artificial intelligence provides a number of tools that are useful to bad actors, such as authoritarian governments, terrorists, criminals or rogue states.
+A lethal autonomous weapon is a machine that locates, selects and engages human targets without human supervision. Widely available AI tools can be used by bad actors to develop inexpensive autonomous weapons and, if produced at scale, they are potentially weapons of mass destruction. Even when used in conventional warfare, they currently cannot reliably choose targets and could potentially kill an innocent person. In 2014, 30 nations (including China) supported a ban on autonomous weapons under the United Nations' Convention on Certain Conventional Weapons, however the United States and others disagreed. By 2015, over fifty countries were reported to be researching battlefield robots.
+AI tools make it easier for authoritarian governments to efficiently control their citizens in several ways. Face and voice recognition allow widespread surveillance. Machine learning, operating this data, can classify potential enemies of the state and prevent them from hiding. Recommendation systems can precisely target propaganda and misinformation for maximum effect. Deepfakes and generative AI aid in producing misinformation. Advanced AI can make authoritarian centralized decision-making more competitive than liberal and decentralized systems such as markets. It lowers the cost and difficulty of digital warfare and advanced spyware. All these technologies have been available since 2020 or earlier—AI facial recognition systems are already being used for mass surveillance in China.
+There are many other ways in which AI is expected to help bad actors, some of which can not be foreseen. For example, machine-learning AI is able to design tens of thousands of toxic molecules in a matter of hours.
+
+==== Technological unemployment ====
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence-9.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence-9.md
@ -0,0 +1,20 @@
+---
+title: "Artificial intelligence"
+chunk: 10/16
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:14.639860+00:00"
+instance: "kb-cron"
+---
+
+Economists have frequently highlighted the risks of redundancies from AI, and speculated about unemployment if there is no adequate social policy for full employment.
+In the past, technology has tended to increase rather than reduce total employment, but economists acknowledge that "we're in uncharted territory" with AI. A survey of economists showed disagreement about whether the increasing use of robots and AI will cause a substantial increase in long-term unemployment, but they generally agree that it could be a net benefit if productivity gains are redistributed. Risk estimates vary; for example, in the 2010s, Michael Osborne and Carl Benedikt Frey estimated 47% of U.S. jobs are at "high risk" of potential automation, while an OECD report classified only 9% of U.S. jobs as "high risk". The methodology of speculating about future employment levels has been criticised as lacking evidential foundation, and for implying that technology, rather than social policy, creates unemployment, as opposed to redundancies. In April 2023, it was reported that 70% of the jobs for Chinese video game illustrators had been eliminated by generative artificial intelligence. Early-career workers showed decreasing employment rates in some AI-exposed occupations.
+Unlike previous waves of automation, many middle-class jobs may be eliminated by artificial intelligence; The Economist stated in 2015 that "the worry that AI could do to white-collar jobs what steam power did to blue-collar ones during the Industrial Revolution" is "worth taking seriously". Jobs at extreme risk range from paralegals to fast food cooks, while job demand is likely to increase for care-related professions ranging from personal healthcare to the clergy. In July 2025, Ford CEO Jim Farley predicted that "artificial intelligence is going to replace literally half of all white-collar workers in the U.S."
+From the early days of the development of artificial intelligence, there have been arguments, for example, those put forward by Joseph Weizenbaum, about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculation and qualitative, value-based judgement.
+
+==== Substitution for human–human interaction ====
+
+With the increase of loneliness in the early 21st century, AI is sometimes identified as a potential source of relief to this problem. It would be possible, via human-like qualities built into AI products, for individuals to assume that this need can be met by artificial means. In some cases, people approach artificial intelligence for companionship when they believe that they would not find acceptance due to feeling outcast. Examples of harm coming to humans from advanced chatbots have been reported in courts in the United States, with AI companies accused of creating products that endanger humans through emotional confusion or deception.
+
+==== Existential risk ====
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence_content_detection-0.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence_content_detection-0.md
@ -0,0 +1,33 @@
+---
+title: "Artificial intelligence content detection"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence_content_detection"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:15.889188+00:00"
+instance: "kb-cron"
+---
+
+Artificial intelligence detection software aims to determine whether some content (text, image, video or audio) was generated using artificial intelligence (AI). This software is often unreliable.
+
+== Accuracy issues ==
+Many AI detection tools have been shown to be unreliable in detecting AI-generated text. In a 2023 study conducted by Weber-Wulff et al., researchers evaluated 14 detection tools including Turnitin and GPTZero and found that "all scored below 80% of accuracy and only 5 over 70%." They also found that these tools tend to have a bias for classifying texts more as human than as AI, and that accuracy of these tools worsens upon paraphrasing.
+
+=== False positives ===
+In AI content detection, a false positive is when human-written work is incorrectly flagged as AI-written. Many AI detection platforms claim to have a minimal level of false positives, with Turnitin claiming a less than 1% false positive rate. However, later research by The Washington Post produced much higher rates of 50%, though they used a smaller sample size. False positives in an academic setting frequently lead to accusations of academic misconduct, which can have serious consequences for a student's academic record. Additionally, studies have shown evidence that many AI detection models are prone to give false positives to work written by people whose first language is not English and people with neurodivergence.
+In June 2023, Janelle Shane wrote that portions of her book You Look Like a Thing and I Love You were flagged as AI-generated.
+
+=== False negatives ===
+A false negative is a failure to identify documents with AI-written text. False negatives often happen as a result of a detection software's sensitivity level or because evasive techniques were used when generating the work to make it sound more human. False negatives are less of a concern academically, since they aren't likely to lead to accusations and ramifications. Notably, Turnitin stated they have a 15% false negative rate.
+
+== Text detection ==
+For text, this is usually done to prevent alleged plagiarism, often by detecting repetition of words as telltale signs that a text was AI-generated (including hallucinations).  Detection systems may also rely on stylistic and structural regularities associated with LLM output, such as unusually consistent grammar, formulaic transitions, repeated discourse markers, and recurring rhetorical templates. Some tools are designed less to establish authorship provenance than to flag prose that resembles common LLM-generated style patterns.
+They are often used by teachers marking their students, usually on an ad hoc basis. Following the release of ChatGPT and similar AI text generative software, many educational establishments have issued policies against the use of AI by students. AI text detection software is also used by those assessing job applicants, as well as online search engines, hiring, online moderation and publishing.
+Current detectors may sometimes be unreliable and have incorrectly marked work by humans as originating from AI while failing to detect AI-generated work in other instances. MIT Technology Review said that the technology "struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool". AI text detection software has also been shown to discriminate against non-native speakers of English.
+Two students from the University of California, Davis, were referred to the university's Office of Student Success and Judicial Affairs (OSSJA) after their professors scanned their essays with positive results; the first with an AI detector called GPTZero, and the second with an AI detector integration in Turnitin. However, following media coverage, and a thorough investigation, the students were cleared of any wrongdoing.
+In April 2023, Cambridge University and other members of the Russell Group of universities in the United Kingdom opted out of Turnitin's AI text detection tool, after expressing concerns it was unreliable. The University of Texas at Austin opted out of the system six months later.
+In May 2023, a professor at Texas A&M University–Commerce used ChatGPT to detect whether his students' content was written by it, which ChatGPT said was the case. As such, he threatened to fail the class despite ChatGPT not being able to detect AI-generated writing. No students were prevented from graduating because of the issue, and all but one student (who admitted to using the software) were exonerated from accusations of having used ChatGPT in their content.
+In July 2023, a paper titled "GPT detectors are biased against non-native English writers" was released, reporting that GPTs discriminate against non-native English authors. The paper compared seven GPT detectors against essays from both non-native English speakers and essays from United States students. The essays from non-native English speakers had an average false positive rate of 61.3%.
+An article by Thomas Germain, published on Gizmodo in June 2024, reported job losses among freelance writers and journalists due to AI text detection software mistakenly classifying their work as AI-generated.
+In September 2024, Common Sense Media reported that generative AI detectors had a 20% false positive rate for Black students, compared to 10% of Latino students and 7% of White students.
+To improve the reliability of AI text detection, researchers have explored digital watermarking techniques. A 2023 paper titled "A Watermark for Large Language Models" presents a method to embed imperceptible watermarks into text generated by large language models (LLMs). This watermarking approach allows content to be flagged as AI-generated with a high level of accuracy, even when text is slightly paraphrased or modified. The technique is designed to be subtle and hard to detect for casual readers, thereby preserving readability, while providing a detectable signal for those employing specialized tools. However, while promising, watermarking faces challenges in remaining robust under adversarial transformations and ensuring compatibility across different LLMs.
--- a/data/en.wikipedia.org/wiki/Artificial_intelligence_content_detection-1.md
+++ b/data/en.wikipedia.org/wiki/Artificial_intelligence_content_detection-1.md
@ -0,0 +1,33 @@
+---
+title: "Artificial intelligence content detection"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Artificial_intelligence_content_detection"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:15.889188+00:00"
+instance: "kb-cron"
+---
+
+== Anti text detection ==
+There is software available designed to bypass AI text detection.
+In practice, evasion may not require specialized bypass tools. Paraphrasing, style editing, and removal of repeated discourse markers can substantially reduce the effectiveness of detectors that rely on recognizable surface patterns. A study published in August 2023 analyzed 20 abstracts from papers published in the Eye Journal, which were then paraphrased using GPT-4.0. The AI-paraphrased abstracts were examined for plagiarism using QueText and for AI-generated content using Originality.AI. The texts were then re-processed through an adversarial software called Undetectable.ai in order to reduce the AI-detection scores. The study found that the AI detection tool, Originality.AI, identified text generated by GPT-4 with a mean accuracy of 91.3%. However, after reprocessing by Undetectable.ai, the detection accuracy of Originality.ai dropped to a mean accuracy of 27.8%.
+Some experts also believe that techniques like digital watermarking are ineffective because they can be removed or added to trigger false positives. "A Watermark for Large Language Models" paper by Kirchenbauer et al. (2023)  also addresses potential vulnerabilities of watermarking techniques. The authors outline a range of adversarial tactics, including text insertion, deletion, and substitution attacks, that could be used to bypass watermark detection. These attacks vary in complexity, from simple paraphrasing to more sophisticated approaches involving tokenization and homoglyph alterations. The study highlights the challenge of maintaining watermark robustness against attackers who may employ automated paraphrasing tools or even specific language model replacements to alter text spans iteratively while retaining semantic similarity. Experimental results show that although such attacks can degrade watermark strength, they also come at the cost of text quality and increased computational resources. 
+
+== Image, video, and audio detection ==
+Several purported AI image detection software exist, to detect AI-generated images (for example, those originating from Midjourney or DALL-E). They are not completely reliable.
+Industry analyses have also noted that AI-driven image recognition systems often struggle in real-world environments, where inconsistent lighting, noise and variable visual inputs reduce detection reliability, a challenge highlighted in modern agricultural quality-control research.
+Others claim to identify video and audio deepfakes, but this technology is also not fully reliable yet either.
+Despite debate around the efficacy of watermarking, Google DeepMind is actively developing a detection software called SynthID, which works by inserting a digital watermark that is invisible to the human eye into the pixels of an image.
+
+== See also ==
+Artificial intelligence
+AI effect
+Copyleaks
+AI alignment
+Artificial intelligence and elections
+Comparison of anti-plagiarism software
+Content similarity detection
+Hallucination (artificial intelligence)
+Natural language processing
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Artificial_wisdom-0.md
+++ b/data/en.wikipedia.org/wiki/Artificial_wisdom-0.md
@ -0,0 +1,63 @@
+---
+title: "Artificial wisdom"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Artificial_wisdom"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:17.239643+00:00"
+instance: "kb-cron"
+---
+
+Artificial wisdom (AW) is an artificial intelligence (AI) system which is able to display the human traits of wisdom and morals while being able to contemplate its own “endpoint”. Artificial wisdom can be described as artificial intelligence reaching the top-level of decision-making when confronted with the most complex challenging situations. The term artificial wisdom is used when the "intelligence" is based on more than by chance collecting and interpreting data, but by design enriched with smart and conscience strategies that wise people would use.
+
+
+== Overview ==
+The goal of artificial wisdom is to create artificial intelligence that can successfully replicate the “uniquely human trait[s]” of having wisdom and morals as closely as possible. Thus, artificial wisdom, must “incorporate [the] ethical and moral considerations” of the data it uses.
+There are also many significant ethical and legal implications of AW which are compounded by the rapid advances in AI and related technologies alongside the lack of the development of ethics, guidelines, and regulations without the oversight of any kind of overarching advisory board. Additionally, there are challenges in how to develop, test, and implement AW in real world scenarios. Existing tests do not test the internal thought process by which a computer system reaches its conclusion, only the result of said process.
+When examining computer-aided wisdom; the partnership of artificial intelligence and contemplative neuroscience, concerns regarding the future of artificial intelligence shift to a more optimistic viewpoint. This artificial wisdom forms the basis of Louis Molnar's monographic article on artificial philosophy, where he coined the term and proposes how artificial intelligence might view its place in the grand scheme of things.
+
+
+== Definitions ==
+There are no universal or standardized definitions for human intelligence, artificial intelligence, human wisdom, or artificial wisdom. However, the DIKW pyramid, describes the continuum of relationship between data, information, knowledge, and wisdom, puts wisdom at the highest level in its hierarchy. Gottfredson defines intelligence as “the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience”.
+Definitions for wisdom typically include requiring:
+
+The ability for emotional regulation,
+Pro-social behaviors (e.g., empathy, compassion, and altruism),
+Self-reflection,
+“A balance between decisiveness and acceptance of uncertainty and diversity of perspectives, and social advising.”
+As previously defined, Artificial Wisdom would then be an AI system which is able to solve problems via “an understanding of…context, ethics and moral principles,” rather than simple pre-defined inputs or “learned patterns.” Some scientists have also considered the field of artificial consciousness. However, Jeste states that “…it is generally agreed that only humans can have consciousness, autonomy, will, and theory of mind.”
+An artificially wise system must also be able to contemplate its end goal and recognize its own ignorance. Additionally, to contemplate its end goal, a wise system must have a “correct conception of worthwhile goals (broadly speaking) or well-being (narrowly speaking)”. "Stephen Grimm further suggests that the following three types of knowledge are individually necessary for wisdom: first, "knowledge of what is good or important for well-being", second, "knowledge of one’s standing, relative to what is good or important for well-being", and third, "knowledge of a strategy for obtaining what is good or important for wellbeing.""
+
+
+== Problems ==
+There are notable problems with attempting to create an artificially wise system. Consciousness, autonomy, and will are considered strictly human features.
+
+
+=== Values ===
+There are significant ethical and philosophical issues when attempting to create an intelligent or a wise system. Notably, whose moral values will be used to train the system to be wise. Differing moral values and prejudice can already be seen from various organizations and governments in artificial intelligence. Deployment strategies and values of Artificial Wisdom will conflict between leaders, companies, and countries. Nusbaum states, “When values are in conflict, leaders often make choices that are clever or smart about their own needs, but are often not wise.”
+
+
+=== Ethics ===
+Science fiction author Isaac Asimov realized the need to control the technology in the 1940s when he wrote the three laws of robotics as follows:
+
+A robot may not injure a human directly or indirectly.
+A robot must obey human’s orders.
+A robot should seek to protect its own existence.
+Additionally, the pace at which technology is rapidly advancing artificial intelligence and thus the need for artificial wisdom may “have outpaced the development of societal guidelines have raised serious questions about the ethics and morality of AI, and called for international oversight and regulations to ensure safety.”
+
+
+=== Principal impossibility ===
+One argument, coined by Tsai as the “argument against AW,” or AAAW, postulates the principal impossibility of Artificial Wisdom. The argument is based on the philosophical differences between practical wisdom, also called phronesis, and practical intelligence. Said difference isn’t in “selecting the correct means, but reasoning correctly about what ends to follow”.
+Tsai puts the argument into a logical proposition as follows:
+
+“(P1) An agent is genuinely wise only if the agent can deliberate about the final goal of the domain in which the agent is situated.”
+“(P2) An intelligent agent cannot deliberate about the final goal of the domain in which the agent is situated.”
+“(C1) An intelligent agent cannot be genuinely wise.”
+“(P3) An AW is, at its core, intelligent.”
+“(C2) An AW cannot be genuinely wise.”
+
+
+== References ==
+
+
+== Further reading ==
--- a/data/en.wikipedia.org/wiki/Birkhoff's_theorem_(equational_logic)-0.md
+++ b/data/en.wikipedia.org/wiki/Birkhoff's_theorem_(equational_logic)-0.md
@ -0,0 +1,14 @@
+---
+title: "Birkhoff's theorem (equational logic)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Birkhoff's_theorem_(equational_logic)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:18.498364+00:00"
+instance: "kb-cron"
+---
+
+In logic, Birkhoff's theorem in equational logic states that an equality t = u is a semantic consequence of a set of equalities E, if and only if t = u can be proven from the set of equalities. It is named after Garrett Birkhoff.
+
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Computational_linguistics-0.md
+++ b/data/en.wikipedia.org/wiki/Computational_linguistics-0.md
@ -0,0 +1,69 @@
+---
+title: "Computational linguistics"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Computational_linguistics"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:19.739149+00:00"
+instance: "kb-cron"
+---
+
+Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others. Computational linguistics is closely related to mathematical linguistics.
+
+
+== Origins ==
+The field overlapped with artificial intelligence since the efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since rule-based approaches were able to make arithmetic (systematic) calculations much faster and more accurately than humans, it was expected that lexicon, morphology, syntax and semantics can be learned using explicit rules, as well. After the failure of rule-based approaches, David Hays coined the term in order to distinguish the field from AI and co-founded both the Association for Computational Linguistics (ACL) and the International Committee on Computational Linguistics (ICCL) in the 1970s and 1980s. What started as an effort to translate between languages evolved into a much wider field of natural language processing.
+
+
+== Annotated corpora ==
+In order to be able to meticulously study the English language, an annotated text corpus was much  needed. The Penn Treebank was one of the most used corpora. It consisted of IBM computer manuals, transcribed telephone conversations, and other texts, together containing over 4.5 million words of American English, annotated using both  part-of-speech tagging and syntactic bracketing.
+Japanese sentence corpora were analyzed and a pattern of log-normality was found in relation to sentence length.
+
+
+== Computational semantics ==
+
+
+== Modeling language acquisition ==
+The fact that during language acquisition, children are largely only exposed to positive evidence, meaning that the only evidence for what is a correct form is provided, and no evidence for what is not correct, was a limitation for the models at the time because the now available deep learning models were not available in late 1980s.
+It has been shown that languages can be learned with a combination of simple input presented incrementally as the child develops better memory and longer attention span, which explained the long period of language acquisition in human infants and children.
+Robots have been used to test linguistic theories. Enabled to learn as children might, models were created based on an affordance model in which mappings between actions, perceptions, and effects were created and linked to spoken words. Crucially, these robots were able to acquire functioning word-to-meaning mappings without needing grammatical structure.
+Using the Price equation and Pólya urn dynamics, researchers have created a system which not only predicts future linguistic evolution but also gives insight into the evolutionary history of modern-day languages.
+
+
+=== Computational models ===
+
+
+== Chomsky's theories ==
+Noam Chomsky's theories have influenced computational linguistics, particularly in understanding how infants learn complex grammatical structures, such as those described in Chomsky normal form. Attempts have been made to determine how an infant learns a "non-normal grammar" as theorized by Chomsky normal form. Research in this area combines structural approaches with computational models to analyze large linguistic corpora like the Penn Treebank, helping to uncover patterns in language acquisition.
+
+
+== Software ==
+
+spaCy
+WordNet
+NooJ
+Foma (software)
+Grammatical Framework
+GloVe
+
+
+== See also ==
+
+
+== References ==
+
+
+== Further reading ==
+
+
+== External links ==
+
+Association for Computational Linguistics (ACL)
+ACL Anthology of research papers
+ACL Wiki for Computational Linguistics
+CICLing annual conferences on Computational Linguistics Archived 2019-02-06 at the Wayback Machine
+Computational Linguistics – Applications workshop
+Free online introductory book on Computational Linguistics at the Wayback Machine (archived January 25, 2008)
+Language Technology World
+Resources for Text, Speech and Language Processing
+The Research Group in Computational Linguistics Archived 2013-08-01 at the Wayback Machine
--- a/data/en.wikipedia.org/wiki/Computer_science-0.md
+++ b/data/en.wikipedia.org/wiki/Computer_science-0.md
@ -0,0 +1,22 @@
+---
+title: "Computer science"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Computer_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:20.986122+00:00"
+instance: "kb-cron"
+---
+
+Computer science is the study of computation, information, and automation. Included broadly in the sciences, computer science spans theoretical disciplines (such as algorithms, theory of computation, and information theory) to applied disciplines (including the design and implementation of hardware and software). An expert in the field is known as a computer scientist. 
+Algorithms and data structures are central to computer science. The theory of computation concerns abstract models of computation and general classes of problems that can be solved using them. The fields of cryptography and computer security involve studying the means for secure communication and preventing security vulnerabilities. Computer graphics and computational geometry address the generation of images. Programming language theory considers different ways to describe computational processes, and database theory concerns the management of repositories of data. Human–computer interaction investigates the interfaces through which humans and computers interact, and software engineering focuses on the design and principles behind developing software. Areas such as operating systems, networks and embedded systems investigate the principles and design behind complex systems. Computer architecture describes the construction of computer components and computer-operated equipment. Artificial intelligence and machine learning aim to synthesize goal-orientated processes such as problem-solving, decision-making, environmental adaptation, planning and learning found in humans and animals. Within artificial intelligence, computer vision aims to understand and process image and video data, while natural language processing aims to understand and process textual and linguistic data.
+The fundamental concern of computer science is determining what can and cannot be automated. The Turing Award is generally recognized as the highest distinction in computer science.
+
+== History ==
+
+The earliest foundations of what would become computer science predate the invention of the modern digital computer. Machines for calculating fixed numerical tasks such as the abacus have existed since antiquity, aiding in computations such as multiplication and division. Algorithms for performing computations have existed since antiquity, even before the development of sophisticated computing equipment.
+Wilhelm Schickard designed and constructed the first working mechanical calculator in 1623. In 1673, Gottfried Leibniz demonstrated a digital mechanical calculator, called the Stepped Reckoner. Leibniz may be considered the first computer scientist and information theorist, because of various reasons, including the fact that he documented the binary number system. In 1820, Thomas de Colmar launched the mechanical calculator industry when he invented his simplified arithmometer, the first calculating machine strong enough and reliable enough to be used daily in an office environment. Charles Babbage started the design of the first automatic mechanical calculator, his Difference Engine, in 1822, which eventually gave him the idea of the first programmable mechanical calculator, his Analytical Engine. He started developing this machine in 1834, and "in less than two years, he had sketched out many of the salient features of the modern computer". "A crucial step was the adoption of a punched card system derived from the Jacquard loom" making it infinitely programmable. In 1843, during the translation of a French article on the Analytical Engine, Ada Lovelace wrote, in one of the many notes she included, an algorithm to compute the Bernoulli numbers, which is considered to be the first published algorithm ever specifically tailored for implementation on a computer. Around 1885, Herman Hollerith invented the tabulator, which used punched cards to process statistical information; eventually his company became part of IBM. Following Babbage, although unaware of his earlier work, Percy Ludgate in 1909 published the 2nd of the only two designs for mechanical analytical engines in history. In 1914, the Spanish engineer Leonardo Torres Quevedo published his Essays on Automatics, and designed, inspired by Babbage, a theoretical electromechanical calculating machine which was to be controlled by a read-only program. The paper also introduced the idea of floating-point arithmetic. In 1920, to celebrate the 100th anniversary of the invention of the arithmometer, Torres presented in Paris the Electromechanical Arithmometer, a prototype that demonstrated the feasibility of an electromechanical analytical engine, on which commands could be typed and the results printed automatically. In 1937, one hundred years after Babbage's impossible dream, Howard Aiken convinced IBM, which was making all kinds of punched card equipment and was also in the calculator business to develop his giant programmable calculator, the ASCC/Harvard Mark I, based on Babbage's Analytical Engine, which itself used punched cards and a central processing unit. When the machine was finished, some hailed it as "Babbage's dream come true".
+
+During the 1940s, with the development of new and more powerful computing machines such as the Atanasoff–Berry computer and ENIAC, the term computer came to refer to the machines rather than their human predecessors. As it became clear that computers could be used for more than just mathematical calculations, the field of computer science broadened to study computation in general. In 1945, IBM founded the Watson Scientific Computing Laboratory at Columbia University in New York City. The renovated fraternity house on Manhattan's West Side was IBM's first laboratory devoted to pure science. The lab is the forerunner of IBM's Research Division, which today operates research facilities around the world. Ultimately, the close relationship between IBM and Columbia University was instrumental in the emergence of a new scientific discipline, with Columbia offering one of the first academic-credit courses in computer science in 1946. Computer science began to be established as a distinct academic discipline in the 1950s and early 1960s. The world's first computer science degree program, the Cambridge Diploma in Computer Science, began at the University of Cambridge Computer Laboratory in 1953. The first computer science department in the United States was formed at Purdue University in 1962. Since practical computers became available, many applications of computing have become distinct areas of study in their own rights.
+
+== Etymology and scope ==
--- a/data/en.wikipedia.org/wiki/Computer_science-1.md
+++ b/data/en.wikipedia.org/wiki/Computer_science-1.md
@ -0,0 +1,18 @@
+---
+title: "Computer science"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Computer_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:20.986122+00:00"
+instance: "kb-cron"
+---
+
+Although first proposed in 1956, the term "computer science" appears in a 1959 article in Communications of the ACM, in which Louis Fein argues for the creation of a Graduate School in Computer Sciences analogous to the creation of Harvard Business School in 1921. Louis justifies the name by arguing that, like management science, the subject is applied and interdisciplinary in nature, while having the characteristics typical of an academic discipline. This effort, and those of others such as numerical analyst George Forsythe, were successful, and universities went on to create such departments, starting with Purdue in 1962. Despite its name, a significant amount of computer science does not involve the study of computers themselves. Because of this, several alternative names have been proposed. Certain departments of major universities prefer the term computing science, to emphasize precisely that difference. Danish scientist Peter Naur suggested the term datalogy, to reflect the fact that the scientific discipline revolves around data and data treatment, while not necessarily involving computers. The first scientific institution to use the term was the Department of Datalogy at the University of Copenhagen, founded in 1969, with Peter Naur being the first professor in datalogy. The term is used mainly in the Scandinavian countries. An alternative term, also proposed by Naur, is data science; this is now used for a multi-disciplinary field of data analysis, including statistics and databases.
+In the early days of computing, a number of terms for the practitioners of the field of computing were suggested (albeit facetiously) in the Communications of the ACM—turingineer, turologist, flow-charts-man, applied meta-mathematician, and applied epistemologist. Three months later in the same journal, comptologist was suggested, followed next year by hypologist. The term computics has also been suggested. In Europe, terms derived from contracted translations of the expression "automatic information" (e.g. "informazione automatica" in Italian) or "information and mathematics" are often used, e.g. informatique (French), Informatik (German), informatica (Italian, Dutch), informática (Spanish, Portuguese), informatika (Slavic languages and Hungarian) or pliroforiki (πληροφορική, which means informatics) in Greek. Similar words have also been adopted in the UK (as in the School of Informatics, University of Edinburgh). "In the U.S., however, informatics is linked with applied computing, or computing in the context of another domain."
+A folkloric quotation, often attributed to—but almost certainly not first formulated by—Edsger Dijkstra, states that "computer science is no more about computers than astronomy is about telescopes." The design and deployment of computers and computer systems is generally considered the province of disciplines other than computer science. For example, the study of computer hardware is usually considered part of computer engineering, while the study of commercial computer systems and their deployment is often called information technology or information systems. However, there has been exchange of ideas between the various computer-related disciplines. Computer science research also often intersects other disciplines, such as cognitive science, linguistics, mathematics, physics, biology, Earth science, statistics, philosophy, and logic.
+Computer science is considered by some to have a much closer relationship with mathematics than many scientific disciplines, with some observers saying that computing is a mathematical science. Early computer science was strongly influenced by the work of mathematicians such as Kurt Gödel, Alan Turing, John von Neumann, Rózsa Péter, Stephen Kleene, and Alonzo Church and there continues to be a useful interchange of ideas between the two fields in areas such as mathematical logic, category theory, domain theory, and algebra.
+The relationship between computer science and software engineering is a contentious issue, which is further muddied by disputes over what the term "software engineering" means, and how computer science is defined. David Parnas, taking a cue from the relationship between other engineering and science disciplines, has claimed that the principal focus of computer science is studying the properties of computation in general, while the principal focus of software engineering is the design of specific computations to achieve practical goals, making the two separate but complementary disciplines.
+The academic, political, and funding aspects of computer science tend to depend on whether a department is formed with a mathematical emphasis or with an engineering emphasis. Computer science departments with a mathematics emphasis and with a numerical orientation consider alignment with computational science. Both types of departments tend to make efforts to bridge the field educationally if not across all research.
+
+== Philosophy ==
--- a/data/en.wikipedia.org/wiki/Computer_science-2.md
+++ b/data/en.wikipedia.org/wiki/Computer_science-2.md
@ -0,0 +1,38 @@
+---
+title: "Computer science"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Computer_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:20.986122+00:00"
+instance: "kb-cron"
+---
+
+=== Epistemology of computer science ===
+Despite the word science in its name, there is debate over whether or not computer science is a discipline of science, mathematics, or engineering. Allen Newell and Herbert A. Simon argued in 1975, Computer science is an empirical discipline. We would have called it an experimental science, but like astronomy, economics, and geology, some of its unique forms of observation and experience do not fit a narrow stereotype of the experimental method. Nonetheless, they are experiments. Each new machine that is built is an experiment. Actually constructing the machine poses a question to nature; and we listen for the answer by observing the machine in operation and analyzing it by all analytical and measurement means available. It has since been argued that computer science can be classified as an empirical science since it makes use of empirical testing to evaluate the correctness of programs, but a problem remains in defining the laws and theorems of computer science (if any exist) and defining the nature of experiments in computer science. Proponents of classifying computer science as an engineering discipline argue that the reliability of computational systems is investigated in the same way as bridges in civil engineering and airplanes in aerospace engineering. They also argue that while empirical sciences observe what presently exists, computer science observes what is possible to exist and while scientists discover laws from observation, no proper laws have been found in computer science and it is instead concerned with creating phenomena.
+Proponents of classifying computer science as a mathematical discipline argue that computer programs are physical realizations of mathematical entities and programs that can be deductively reasoned through mathematical formal methods. Computer scientists Edsger W. Dijkstra and Tony Hoare regard instructions for computer programs as mathematical sentences and interpret formal semantics for programming languages as mathematical axiomatic systems.
+
+=== Paradigms of computer science ===
+A number of computer scientists have argued for the distinction of three separate paradigms in computer science. Peter Wegner argued that those paradigms are science, technology, and mathematics. Peter Denning's working group argued that they are theory, abstraction (modeling), and design. Amnon H. Eden described them as the "rationalist paradigm" (which treats computer science as a branch of mathematics, which is prevalent in theoretical computer science, and mainly employs deductive reasoning), the "technocratic paradigm" (which might be found in engineering approaches, most prominently in software engineering), and the "scientific paradigm" (which approaches computer-related artifacts from the empirical perspective of natural sciences, identifiable in some branches of artificial intelligence). Computer science focuses on methods involved in design, specification, programming, verification, implementation and testing of human-made computing systems.
+
+== Fields ==
+
+As a discipline, computer science spans a range of topics from theoretical studies of algorithms and the limits of computation to the practical issues of implementing computing systems in hardware and software. CSAB, formerly called Computing Sciences Accreditation Board—which is made up of representatives of the Association for Computing Machinery (ACM), and the IEEE Computer Society (IEEE CS)—identifies four areas that it considers crucial to the discipline of computer science: theory of computation, algorithms and data structures, programming methodology and languages, and computer elements and architecture. In addition to these four areas, CSAB also identifies fields such as software engineering, artificial intelligence, computer networking and communication, database systems, parallel computation, distributed computation, human–computer interaction, computer graphics, operating systems, and numerical and symbolic computation as being important areas of computer science.
+
+=== Theoretical computer science ===
+
+Theoretical computer science is mathematical and abstract in spirit, but it derives its motivation from practical and everyday computation. It aims to understand the nature of computation and, as a consequence of this understanding, provide more efficient methodologies.
+
+==== Theory of computation ====
+
+According to Peter Denning, the fundamental question underlying computer science is, "What can be automated?" Theory of computation is focused on answering fundamental questions about what can be computed and what amount of resources are required to perform those computations. In an effort to answer the first question, computability theory examines which computational problems are solvable on various theoretical models of computation. The second question is addressed by computational complexity theory, which studies the time and space costs associated with different approaches to solving a multitude of computational problems.
+The famous P = NP? problem, one of the Millennium Prize Problems, is an open problem in the theory of computation.
+
+==== Information and coding theory ====
+
+Information theory, closely related to probability and statistics, is related to the quantification of information. This was developed by Claude Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and communicating data. Coding theory is the study of the properties of codes (systems for converting information from one form to another) and their fitness for a specific application. Codes are used for data compression, cryptography, error detection and correction, and more recently also for network coding. Codes are studied for the purpose of designing efficient and reliable data transmission methods.
+
+==== Data structures and algorithms ====
+Data structures and algorithms are the studies of commonly used computational methods and their computational efficiency.
+
+==== Programming language theory and formal methods ====
--- a/data/en.wikipedia.org/wiki/Computer_science-3.md
+++ b/data/en.wikipedia.org/wiki/Computer_science-3.md
@ -0,0 +1,46 @@
+---
+title: "Computer science"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Computer_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:20.986122+00:00"
+instance: "kb-cron"
+---
+
+Programming language theory is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of programming languages and their individual features. It falls within the discipline of computer science, both depending on and affecting mathematics, software engineering, and linguistics. It is an active research area, with numerous dedicated academic journals.
+Formal methods are a particular kind of mathematically based technique for the specification, development and verification of software and hardware systems. The use of formal methods for software and hardware design is motivated by the expectation that, as in other engineering disciplines, performing appropriate mathematical analysis can contribute to the reliability and robustness of a design. They form an important theoretical underpinning for software engineering, especially where safety or security is involved. Formal methods are a useful adjunct to software testing since they help avoid errors and can also give a framework for testing. For industrial use, tool support is required. However, the high cost of using formal methods means that they are usually only used in the development of high-integrity and life-critical systems, where safety or security is of utmost importance. Formal methods are best described as the application of a fairly broad variety of theoretical computer science fundamentals, in particular logic calculi, formal languages, automata theory, and program semantics, but also type systems and algebraic data types to problems in software and hardware specification and verification.
+
+=== Applied computer science ===
+
+==== Computer graphics and visualization ====
+
+Computer graphics is the study of digital visual contents and involves the synthesis and manipulation of image data. The study is connected to many other fields in computer science, including computer vision, image processing, and computational geometry, and is heavily applied in the fields of special effects and video games.
+
+==== Image and sound processing ====
+
+Information can take the form of images, sound, video or other multimedia. Bits of information can be streamed via signals. Its processing is the central notion of informatics, the European view on computing, which studies information processing algorithms independently of the type of information carrier – whether it is electrical, mechanical or biological. This field plays important role in information theory, telecommunications, information engineering and has applications in medical image computing and speech synthesis, among others. What is the lower bound on the complexity of fast Fourier transform algorithms? is one of the unsolved problems in theoretical computer science.
+
+==== Computational science, finance and engineering ====
+
+Scientific computing (or computational science) is the field of study concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems. A major usage of scientific computing is simulation of various processes, including computational fluid dynamics, physical, electrical, and electronic systems and circuits, societies and social situations (notably war games) along with their habitats, and interactions among biological cells. Modern computers enable optimization of such designs as complete aircraft. Notable in electrical and electronic circuit design are SPICE, as well as software for physical realization of new (or modified) designs. The latter includes essential design software for integrated circuits.
+
+==== Human–computer interaction ====
+
+Human–computer interaction (HCI) is the field of study and research concerned with the design and use of computer systems, mainly based on the analysis of the interaction between humans and computer interfaces. HCI has several subfields that focus on the relationship between emotions, social behavior and brain activity with computers.
+
+==== Software engineering ====
+
+Software engineering is the study of designing, implementing, and modifying the software in order to ensure it is of high quality, affordable, maintainable, and fast to build. It is a systematic approach to software design, involving the application of engineering practices to software. Software engineering deals with the organizing and analyzing of software—it does not just deal with the creation or manufacture of new software, but its internal arrangement and maintenance. For example software testing, systems engineering, technical debt and software development processes.
+
+==== Artificial intelligence ====
+
+Artificial intelligence (AI) aims to or is required to synthesize goal-orientated processes such as problem-solving, decision-making, environmental adaptation, learning, and communication found in humans and animals. From its origins in cybernetics and in the Dartmouth Conference (1956), artificial intelligence research has been necessarily cross-disciplinary, drawing on areas of expertise such as applied mathematics, symbolic logic, semiotics, electrical engineering, philosophy of mind, neurophysiology, and social intelligence. AI is associated in the popular mind with robotic development, but the main field of practical application has been as an embedded component in areas of software development, which require computational understanding. The starting point in the late 1940s was Alan Turing's question "Can computers think?", and the question remains effectively unanswered, although the Turing test is still used to assess computer output on the scale of human intelligence. But the automation of evaluative and predictive tasks has been increasingly successful as a substitute for human monitoring and intervention in domains of computer application involving complex real-world data.
+
+=== Computer systems ===
+
+==== Computer architecture and microarchitecture ====
+
+Computer architecture, or digital computer organization, is the conceptual design and fundamental operational structure of a computer system. It focuses largely on the way by which the central processing unit performs internally and accesses addresses in memory. Computer engineers study computational logic and design of computer hardware, from individual processor components, microcontrollers, personal computers to supercomputers and embedded systems. The term "architecture" in computer literature can be traced to the work of Lyle R. Johnson and Frederick P. Brooks Jr., members of the Machine Organization department in IBM's main research center in 1959.
+
+==== Concurrent, parallel and distributed computing ====
--- a/data/en.wikipedia.org/wiki/Computer_science-4.md
+++ b/data/en.wikipedia.org/wiki/Computer_science-4.md
@ -0,0 +1,73 @@
+---
+title: "Computer science"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Computer_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:20.986122+00:00"
+instance: "kb-cron"
+---
+
+Concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other. A number of mathematical models have been developed for general concurrent computation including Petri nets, process calculi and the parallel random access machine model. When multiple computers are connected in a network while using concurrency, this is known as a distributed system. Computers within that distributed system have their own private memory, and information can be exchanged to achieve common goals.
+
+==== Computer networks ====
+
+This branch of computer science aims studies the construction and behavior of computer networks. It addresses their performance, resilience, security, scalability, and cost-effectiveness, along with the variety of services they can provide.
+
+==== Computer security and cryptography ====
+
+Computer security is a branch of computer technology with the objective of protecting information from unauthorized access, disruption, or modification while maintaining the accessibility and usability of the system for its intended users.
+Historical cryptography is the art of writing and deciphering secret messages. Modern cryptography is the scientific study of problems relating to distributed computations that can be attacked. Technologies studied in modern cryptography include symmetric and asymmetric encryption, digital signatures, cryptographic hash functions, key-agreement protocols, blockchain, zero-knowledge proofs, and garbled circuits.
+
+==== Databases and data mining ====
+
+A database is intended to organize, store, and retrieve large amounts of data easily. Digital databases are managed using database management systems to store, create, maintain, and search data, through database models and query languages. Data mining is a process of discovering patterns in large data sets.
+
+== Discoveries ==
+The philosopher of computing Bill Rapaport noted three Great Insights of Computer Science:
+
+Gottfried Wilhelm Leibniz's, George Boole's, Alan Turing's, Claude Shannon's, and Samuel Morse's insight: there are only two objects that a computer has to deal with in order to represent "anything".
+All the information about any computable problem can be represented using only 0 and 1 (or any other bistable pair that can flip-flop between two easily distinguishable states, such as "on/off", "magnetized/de-magnetized", "high-voltage/low-voltage", etc.).
+
+Alan Turing's insight: there are only five actions that a computer has to perform in order to do "anything".
+Every algorithm can be expressed in a language for a computer consisting of only five basic instructions:
+move left one location;
+move right one location;
+read symbol at current location;
+print 0 at current location;
+print 1 at current location.
+
+Corrado Böhm and Giuseppe Jacopini's insight: there are only three ways of combining these actions (into more complex ones) that are needed in order for a computer to do "anything".
+Only three rules are needed to combine any set of basic instructions into more complex ones:
+sequence: first do this, then do that;
+ selection: IF such-and-such is the case, THEN do this, ELSE do that;
+repetition: WHILE such-and-such is the case, DO this.
+The three rules of Boehm's and Jacopini's insight can be further simplified with the use of goto (which means it is more elementary than structured programming).
+
+== Programming paradigms ==
+
+Programming languages can be used to accomplish different tasks in different ways. Common programming paradigms include:
+
+Functional programming, a style of building the structure and elements of computer programs that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It is a declarative programming paradigm, which means programming is done with expressions or declarations instead of statements.
+Imperative programming, a programming paradigm that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program consists of commands for the computer to perform. Imperative programming focuses on describing how a program operates.
+Object-oriented programming, a programming paradigm based on the concept of "objects", which may contain data, in the form of fields, often known as attributes; and code, in the form of procedures, often known as methods. A feature of objects is that an object's procedures can access and often modify the data fields of the object with which they are associated. Thus object-oriented computer programs are made out of objects that interact with one another.
+Service-oriented programming, a programming paradigm that uses "services" as the unit of computer work, to design and implement integrated business applications and mission critical software programs.
+Many languages offer support for multiple paradigms, making the distinction more a matter of style than of technical capabilities.
+
+== Research ==
+
+Conferences are important events for computer science research. During these conferences, researchers from the public and private sectors present their recent work and meet. Unlike in most other academic fields, in computer science, the prestige of conference papers is greater than that of journal publications. One proposed explanation for this is the quick development of this relatively new field requires rapid review and distribution of results, a task better handled by conferences than by journals.
+
+== See also ==
+
+== Notes ==
+
+== References ==
+
+== Further reading ==
+
+== External links ==
+
+DBLP Computer Science Bibliography
+Association for Computing Machinery
+Institute of Electrical and Electronics Engineers
--- a/data/en.wikipedia.org/wiki/Confrontation_analysis-0.md
+++ b/data/en.wikipedia.org/wiki/Confrontation_analysis-0.md
@ -0,0 +1,52 @@
+---
+title: "Confrontation analysis"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Confrontation_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:22.286673+00:00"
+instance: "kb-cron"
+---
+
+Confrontation analysis (also known as dilemma analysis) is an operational analysis technique used to structure, understand, and analyze multi-party interactions, such as negotiations or conflicts. It serves as the mathematical foundation for drama theory. 
+While based on game theory, confrontation analysis differs in that it focuses on the idea that players may redefine the game during the interaction, often due to the influence of emotions. In traditional game theory, players generally work within a fixed set of rules (represented by a decision matrix). However, confrontation analysis sees the interaction as a sequence of linked decisions, where the rules or perceptions of the game can shift over time, influenced by emotional dilemmas or psychological factors that arise during the interaction. 
+
+== Derivation and use ==
+Confrontation analysis was devised by Professor Nigel Howard in the early 1990s drawing from his work on game theory and metagame analysis. It has been turned to defence, political, legal, financial and commercial applications.
+
+Much of the theoretical background to General Rupert Smith's book The Utility of Force drew its inspiration from the theory of confrontation analysis.I am in debt to Professor Nigel Howard, whose explanation of Confrontation Analysis and Game Theory at a seminar in 1998 excited my interest. Our subsequent discussions helped me to order my thoughts and the lessons I had learned into a coherent structure with the result that, for the first time, I was able to understand my experiences within a theoretical model which allowed me to use them further
+Confrontation analysis can also be used in a decision workshop as structure to support role-playing for training, analysis and decision rehearsal.
+Confrontation analysis was continually developed by Professor Nigel Howard during his lifetime and was considerably revised and simplified (from Version 1 to Version 2) a year or so before his death.  This means that much of what he wrote was about Version 1, although much of the follow-on work since then has embraced Version 2. Minor changes bring the current version up to 2.5. All this means that when an AI is asked about Confrontation Analysis it will often hallucinate when giving its answers. This can be partially mitigated by giving more specific questions (e.g. "Describe the dilemmas in Version 2.5 of Confrontation Analysis" rather than "Describe the dilemmas in Confrontation Analysis").
+
+== Method (Version 2.5) ==
+
+Confrontation analysis looks on an interaction as a sequence of confrontations. During each confrontation the parties communicate until they have made their positions clear to one another.  These positions can be expressed as a options table (also known as a card table ) of yes/no decisions. For each decision each party communicates what they would like to happen (their position) and what will happen if they cannot agree (the threatened future). These interactions produce precisely defined dilemmas and the options table changes as players attempt to eliminate these.
+
+Consider the example on the right (Initial Card Table), taken from the 2023 Gaza Conflict. This represents an interaction between Hamas and Israel over the Gaza conflict.
+Each side had a position as to what they wanted to happen:
+Hamas wanted (see 4th column):
+
+Israel to give back all its land to the Palestinians
+Israel wanted (See 5th column):
+
+NOT to give back all its land to the Palestinians
+For the Palestinians NOT to be able to attack Israel effectively
+If no further changes were made then what the sides were saying would happen was (see 1st column):
+
+Israel would NOT give back all its land to the Palestinians
+Hamas would continue to fire rockets at Israel
+Hamas would continue to attack Israel effectively
+Confrontation analysis then specifies a number of precisely defined dilemmas that occur to the parties following from the structure of the card tables.  It states that motivated by the desire to eliminate these dilemmas, the parties involved will CHANGE THE OPTIONS TABLE.
+In the situation at the start Israel has one dilemma, and Hamas has two.  Israel has a persuasion dilemma in that it wants Hamas not to attack it effectively. Hamas has a persuasion dilemma in that Israel has not given back all its land to the Palestinians.  It also has a sufficiency dilemma  in that its rocket attacks on Israel were not causing enough pressure on Israel for it to act on this.
+Faced with these dilemmas, the Israel modified the options table to eliminate its dilemma.  It attacked Gaza to destroy Hamas, so that Hamas would be unable to attack it in a way that would be effective again.
+
+The options table was then modified to that shown on the right:
+
+Israel has eliminated its dilemma as it no longer thinks Hamas is able to attack it effectively (even if Hamas claims it can). This is shown by the small blue box with a cross in it in the second column (we put the small box in the second column if the party doubts the other party is able, and in the first column if the party doubts the other party is willing)
+Hamas has gained a rejection dilemma, as its threat to attack Israel effectively is no longer credible.
+Israel now has no dilemmas, so is politically content with the situation.  However, Hamas still has dilemmas and will struggle to eliminate them
+The threatened future (that which will happen if nothing changes) is now:
+Israel will not give all the land of Israel back to the Palestinians.
+Hamas will continue to fire rockets at Israel.
+Hamas says it will continue to attack Israel effectively, but Israel doubts it thinking it is unable.
+Israel will continue to destroy Hamas in Gaza.
--- a/data/en.wikipedia.org/wiki/Confrontation_analysis-1.md
+++ b/data/en.wikipedia.org/wiki/Confrontation_analysis-1.md
@ -0,0 +1,31 @@
+---
+title: "Confrontation analysis"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Confrontation_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:22.286673+00:00"
+instance: "kb-cron"
+---
+
+A second example on the left shows the other two dilemmas: The trust dilemma and the co-operation dilemma.  This example is taken from lead-up to the 2003 Iraq war.
+Here the USA thought that Saddam Hussein was developing Weapons of Mass Destruction.  Saddam said he was not, but the USA doubted this and thought that he was.  They therefore invaded Iraq.
+Confrontation analysis does not necessarily produce a win-win solution (although end states are more likely to remain stable if they do); however, the word confrontation should not necessarily imply that any negotiations should be carried out in an aggressive way.
+The card tables are isomorphic to game theory models. The aim is to find the dilemmas facing participants and so help to forecast how the participants might change the options table to eliminate them. The forecast requires both analysis of the model and its dilemmas, and also exploration of the reality outside the model; both will show the strategies the participant might use change the options table to eliminate dilemmas.
+Sometimes analysis of the ticks and crosses can be supported by values showing the payoff to each of the parties.
+
+== References ==
+
+== External links ==
+ConfrontationAnalysis.info A public domain website with instructions about how to do Version 2.5 of confrontation analysis with a tutorial page.
+Decision Workshops - A public domain website using Version 2.2 of confrontation analysis with many examples
+Dilemma Explorer - A software application to do Version 2.5 of Confrontation Analysis
+Confrontation Manager — A software application using Version 1 of Confrontation Analysis.
+Confronteer an iPhone app to do Version 2 of Confrontation Analysis.
+Smith R, Tait A, Howard N (1999) Confrontations in War and Peace Contains a description of Operation Desert Storm and the 1996 Bosnian Crisis using Version 1 of Confrontation Analysis.
+N. Howard, 'Confrontation Analysis: How to win operations other than war', CCRP Publications, 1999 (used Version 1 of Confrontation Analysis)
+P. Bennett, J. Bryant and N. Howard, 'Drama Theory and Confrontation Analysis' — can be found (along with other recent PSM methods) in: J. V. Rosenhead and J. Mingers (eds) Rational Analysis for a Problematic World Revisited: problem structuring methods for complexity, uncertainty and conflict, Wiley, 2001. (Version 1)
+J. Bryant, The Six Dilemmas of Collaboration: inter-organisational relationships as drama, Wiley, 2003. (Version 1)
+N. Howard, Paradoxes of Rationality', MIT Press, 1971 (Version 1)
+How to structure disputes using Confrontation Analysis contains an illustrated explanation of Version 2 of Confrontation Analysis.
+Speed Confrontation Management a brief "How to" manual on doing Confrontation Analysis without using an Options Table.
--- a/data/en.wikipedia.org/wiki/Cryptography-0.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-0.md
@ -0,0 +1,16 @@
+---
+title: "Cryptography"
+chunk: 1/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+Cryptography, or cryptology, is the practice and study of techniques for secure communication in the presence of adversarial behavior. More generally, cryptography is about constructing and analyzing protocols that prevent third parties or the public from reading private messages. Modern cryptography exists at the intersection of the disciplines of mathematics, computer science, information security, electrical engineering, digital signal processing, physics, and others. Core concepts related to information security (data confidentiality, data integrity, authentication and non-repudiation) are also central to cryptography. Practical applications of cryptography include electronic commerce, chip-based payment cards, digital currencies, computer passwords and military communications.
+Cryptography prior to the modern age was effectively synonymous with encryption, converting readable information (plaintext) to unintelligible nonsense text (ciphertext), which can only be read by reversing the process (decryption). The sender of an encrypted (coded) message shares the decryption (decoding) technique only with the intended recipients to preclude access from adversaries. The cryptography literature often uses the names "Alice" (or "A") for the sender, "Bob" (or "B") for the intended recipient, and "Eve" (or "E") for the eavesdropping adversary. Since the development of rotor cipher machines in World War I and the advent of computers in World War II, cryptography methods have become increasingly complex and their applications more varied.
+Modern cryptography is heavily based on mathematical theory and computer science practice; cryptographic algorithms are designed around computational hardness assumptions, making such algorithms hard to break in actual practice by any adversary. While it is theoretically possible to break into a well-designed system, it is infeasible in actual practice to do so. Such schemes, if well designed, are therefore termed "computationally secure". Theoretical advances (e.g., improvements in integer factorization algorithms) and faster computing technology require these designs to be continually reevaluated and, if necessary, adapted. Information-theoretically secure schemes that provably cannot be broken even with unlimited computing power, such as the one-time pad, are much more difficult to use in practice than the best theoretically breakable but computationally secure schemes.
+The growth of cryptographic technology has raised a number of legal issues in the Information Age. Cryptography's potential for use as a tool for espionage and sedition has led many governments to classify it as a weapon and to limit or even prohibit its use and export. In some jurisdictions where the use of cryptography is legal, laws permit investigators to compel the disclosure of encryption keys for documents relevant to an investigation. Cryptography also plays a major role in digital rights management and copyright infringement disputes with regard to digital media.
+
+== Terminology ==
--- a/data/en.wikipedia.org/wiki/Cryptography-1.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-1.md
@ -0,0 +1,23 @@
+---
+title: "Cryptography"
+chunk: 2/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+The first use of the term "cryptograph" (as opposed to "cryptogram") dates back to the 19th century – originating from "The Gold-Bug", a story by Edgar Allan Poe. The etymology of the term goes back to the Greek language; it is formed by two constituents: "crypton" (hidden), and "grapho" (to write).
+Until modern times, cryptography referred almost exclusively to "encryption", which is the process of converting ordinary information (called plaintext) into an unintelligible form (called ciphertext). Decryption is the reverse, in other words, moving from the unintelligible ciphertext back to plaintext. A cipher (or cypher) is a pair of algorithms that carry out the encryption and the reversing decryption. The detailed operation of a cipher is controlled both by the algorithm and, in each instance, by a "key". The key is a secret (ideally known only to the communicants), usually a string of characters (ideally short so it can be remembered by the user), which is needed to decrypt the ciphertext. In formal mathematical terms, a "cryptosystem" is the ordered list of elements of finite possible plaintexts, finite possible cyphertexts, finite possible keys, and the encryption and decryption algorithms that correspond to each key. Keys are important both formally and in actual practice, as ciphers without variable keys can be trivially broken with only the knowledge of the cipher used and are therefore useless (or even counter-productive) for most purposes. Historically, ciphers were often used directly for encryption or decryption without additional procedures such as authentication or integrity checks.
+There are two main types of cryptosystems: symmetric and asymmetric. In symmetric systems, the only ones known until the 1970s, the same secret key encrypts and decrypts a message. Data manipulation in symmetric systems is significantly faster than in asymmetric systems. Asymmetric systems use a "public key" to encrypt a message and a related "private key" to decrypt it. The advantage of asymmetric systems is that the public key can be freely published, allowing parties to establish secure communication without having a shared secret key. In practice, asymmetric systems are used to first exchange a secret key, and then secure communication proceeds via a more efficient symmetric system using that key. Examples of asymmetric systems include Diffie–Hellman key exchange, RSA (Rivest–Shamir–Adleman), ECC (Elliptic Curve Cryptography), and Post-quantum cryptography. Secure symmetric algorithms include the commonly used AES (Advanced Encryption Standard) which replaced the older DES (Data Encryption Standard). Insecure symmetric algorithms include children's language tangling schemes such as Pig Latin or other cant, and all historical cryptographic schemes, however seriously intended, prior to the invention of the one-time pad early in the 20th century.
+In colloquial use, the term "code" is often used to mean any method of encryption or concealment of meaning. However, in cryptography, code has a more specific meaning: the replacement of a unit of plaintext (i.e., a meaningful word or phrase) with a code word (for example, "wallaby" replaces "attack at dawn"). A cypher, in contrast, is a scheme for changing or substituting an element below such a level (a letter, a syllable, or a pair of letters, etc.) to produce a cyphertext.
+Cryptanalysis is the term used for the study of methods for obtaining the meaning of encrypted information without access to the key normally required to do so; i.e., it is the study of how to "crack" encryption algorithms or their implementations.
+Some use the terms "cryptography" and "cryptology" interchangeably in English, while others (including US military practice generally) use "cryptography" to refer specifically to the use and practice of cryptographic techniques and "cryptology" to refer to the combined study of cryptography and cryptanalysis. English is more flexible than several other languages in which "cryptology" (done by cryptologists) is always used in the second sense above. RFC 2828 advises that steganography is sometimes included in cryptology.
+The study of characteristics of languages that have some application in cryptography or cryptology (e.g. frequency data, letter combinations, universal patterns, etc.) is called cryptolinguistics. Cryptolingusitics is especially used in military intelligence applications for deciphering foreign communications.
+
+== History ==
+
+Before the modern era, cryptography focused on message confidentiality (i.e., encryption)—conversion of messages from a comprehensible form into an incomprehensible one and back again at the other end, rendering it unreadable by interceptors or eavesdroppers without secret knowledge (namely the key needed for decryption of that message). Encryption attempted to ensure secrecy in communication, such as those of spies, military leaders, and diplomats. In recent decades, the field has expanded beyond confidentiality concerns to include techniques for message integrity checking, sender/receiver identity authentication, digital signatures, interactive proofs and secure computation, among others.
+
+=== Classic cryptography ===
--- a/data/en.wikipedia.org/wiki/Cryptography-10.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-10.md
@ -0,0 +1,51 @@
+---
+title: "Cryptography"
+chunk: 11/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+In the United Kingdom, the Regulation of Investigatory Powers Act gives UK police the powers to force suspects to decrypt files or hand over passwords that protect encryption keys. Failure to comply is an offense in its own right, punishable on conviction by a two-year jail sentence or up to five years in cases involving national security. Successful prosecutions have occurred under the Act; the first, in 2009, resulted in a term of 13 months' imprisonment. Similar forced disclosure laws in Australia, Finland, France, and India compel individual suspects under investigation to hand over encryption keys or passwords during a criminal investigation.
+In the United States, the federal criminal case of United States v. Fricosu addressed whether a search warrant can compel a person to reveal an encryption passphrase or password. The Electronic Frontier Foundation (EFF) argued that this is a violation of the protection from self-incrimination given by the Fifth Amendment. In 2012, the court ruled that under the All Writs Act, the defendant was required to produce an unencrypted hard drive for the court.
+In many jurisdictions, the legal status of forced disclosure remains unclear.
+The 2016 FBI–Apple encryption dispute concerns the ability of courts in the United States to compel manufacturers' assistance in unlocking cell phones whose contents are cryptographically protected.
+As a potential counter-measure to forced disclosure some cryptographic software supports plausible deniability, where the encrypted data is indistinguishable from unused random data (for example such as that of a drive which has been securely wiped).
+
+== See also ==
+Collision attack
+Comparison of cryptography libraries
+Cryptovirology – Securing and encrypting virology
+Crypto Wars – Attempts to limit access to strong cryptography
+Encyclopedia of Cryptography and Security – Book by Technische Universiteit Eindhoven
+Global surveillance – Mass surveillance across national borders
+Indistinguishability obfuscation – Type of cryptographic software obfuscation
+Information theory – Scientific study of digital information
+Outline of cryptography
+List of cryptographers
+List of multiple discoveries
+List of cryptography books
+List of open-source Cypherpunk software
+List of unsolved problems in computer science – List of unsolved computational problems
+Pre-shared key – Method to set encryption keys
+Quantum cryptography – Cryptography based on quantum mechanical phenomena
+Secure cryptoprocessor
+Strong cryptography – Term applied to cryptographic systems that are highly resistant to cryptanalysis
+Syllabical and Steganographical Table – Eighteenth-century work believed to be the first cryptography chart – first cryptography chart
+World Wide Web Consortium's Web Cryptography API – World Wide Web Consortium cryptography standard
+
+== References ==
+
+== Further reading ==
+
+== External links ==
+
+ The dictionary definition of cryptography at Wiktionary
+ Media related to Cryptography at Wikimedia Commons
+Cryptography on In Our Time at the BBC
+Crypto Glossary and Dictionary of Technical Cryptography Archived 4 July 2022 at the Wayback Machine
+A Course in Cryptography by Raphael Pass & Abhi Shelat – offered at Cornell in the form of lecture notes.
+For more on the use of cryptographic elements in fiction, see: Dooley, John F. (23 August 2012). "Cryptology in Fiction". Archived from the original on 29 July 2020. Retrieved 20 February 2015.
+The George Fabyan Collection at the Library of Congress has early editions of works of seventeenth-century English literature, publications relating to cryptography.
--- a/data/en.wikipedia.org/wiki/Cryptography-2.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-2.md
@ -0,0 +1,17 @@
+---
+title: "Cryptography"
+chunk: 3/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+The main classical cipher types are transposition ciphers, which rearrange the order of letters in a message (e.g., 'hello world' becomes 'ehlol owrdl' in a trivially simple rearrangement scheme), and substitution ciphers, which systematically replace letters or groups of letters with other letters or groups of letters (e.g., 'fly at once' becomes 'gmz bu podf' by replacing each letter with the one following it in the Latin alphabet). Simple versions of either have never offered much confidentiality from enterprising opponents. An early substitution cipher was the Caesar cipher, in which each letter in the plaintext was replaced by a letter three positions further down the alphabet. Suetonius reports that Julius Caesar used it with a shift of three to communicate with his generals. Atbash is an example of an early Hebrew cipher. The earliest known use of cryptography is some carved ciphertext on stone in Egypt (c. 1900 BCE), but this may have been done for the amusement of literate observers rather than as a way of concealing information.
+The Greeks of Classical times are said to have known of ciphers (e.g., the scytale transposition cipher claimed to have been used by the Spartan military). Steganography (i.e., hiding even the existence of a message so as to keep it confidential) was also first developed in ancient times. An early example, from Herodotus, was a message tattooed on a slave's shaved head and concealed under the regrown hair. Other steganography methods involve 'hiding in plain sight,' such as using a music cipher to disguise an encrypted message within a regular piece of sheet music. More modern examples of steganography include the use of invisible ink, microdots, and digital watermarks to conceal information.
+In India, the 2000-year-old Kama Sutra of Vātsyāyana speaks of two different kinds of ciphers called Kautiliyam and Mulavediya. In the Kautiliyam, the cipher letter substitutions are based on phonetic relations, such as vowels becoming consonants. In the Mulavediya, the cipher alphabet consists of pairing letters and using the reciprocal ones.
+In Sassanid Persia, there were two secret scripts, according to the Muslim author Ibn al-Nadim: the šāh-dabīrīya (literally "King's script") which was used for official correspondence, and the rāz-saharīya which was used to communicate secret messages with other countries.
+David Kahn notes in The Codebreakers that modern cryptology originated among the Arabs, the first people to systematically document cryptanalytic methods. Al-Khalil (717–786) wrote the Book of Cryptographic Messages, which contains the first use of permutations and combinations to list all possible Arabic words with and without vowels.
+
+Ciphertexts produced by a classical cipher (and some modern ciphers) will reveal statistical information about the plaintext, and that information can often be used to break the cipher. After the discovery of frequency analysis, nearly all such ciphers could be broken by an informed attacker. Such classical ciphers still enjoy popularity today, though mostly as puzzles (see cryptogram). The Arab mathematician and polymath Al-Kindi wrote a book on cryptography entitled Risalah fi Istikhraj al-Mu'amma (Manuscript for the Deciphering Cryptographic Messages), which described the first known use of frequency analysis cryptanalysis techniques.
--- a/data/en.wikipedia.org/wiki/Cryptography-3.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-3.md
@ -0,0 +1,20 @@
+---
+title: "Cryptography"
+chunk: 4/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+Language letter frequencies may offer little help for some extended historical encryption techniques such as homophonic cipher that tend to flatten the frequency distribution. For those ciphers, language letter group (or n-gram) frequencies may provide an attack.
+Essentially all ciphers remained vulnerable to cryptanalysis using the frequency analysis technique until the development of the polyalphabetic cipher, most clearly by Leon Battista Alberti around the year 1467, though there is some indication that it was already known to Al-Kindi. Alberti's innovation was to use different ciphers (i.e., substitution alphabets) for various parts of a message (perhaps for each successive plaintext letter at the limit). He also invented what was probably the first automatic cipher device, a wheel that implemented a partial realization of his invention. In the Vigenère cipher, a polyalphabetic cipher, encryption uses a key word, which controls letter substitution depending on which letter of the key word is used. In the mid-19th century Charles Babbage showed that the Vigenère cipher was vulnerable to Kasiski examination, but this was first published about ten years later by Friedrich Kasiski.
+Although frequency analysis can be a powerful and general technique against many ciphers, encryption has still often been effective in practice, as many a would-be cryptanalyst was unaware of the technique. Breaking a message without using frequency analysis essentially required knowledge of the cipher used and perhaps of the key involved, thus making espionage, bribery, burglary, defection, etc., more attractive approaches to the cryptanalytically uninformed. It was finally explicitly recognized in the 19th century that secrecy of a cipher's algorithm is not a sensible nor practical safeguard of message security; in fact, it was further realized that any adequate cryptographic scheme (including ciphers) should remain secure even if the adversary fully understands the cipher algorithm itself. Security of the key used should alone be sufficient for a good cipher to maintain confidentiality under an attack. This fundamental principle was first explicitly stated in 1883 by Auguste Kerckhoffs and is generally called Kerckhoffs's Principle; alternatively and more bluntly, it was restated by Claude Shannon, the inventor of information theory and the fundamentals of theoretical cryptography, as Shannon's Maxim—'the enemy knows the system'.
+Different physical devices and aids have been used to assist with ciphers. One of the earliest may have been the scytale of ancient Greece, a rod supposedly used by the Spartans as an aid for a transposition cipher. In medieval times, other aids were invented such as the cipher grille, which was also used for a kind of steganography. With the invention of polyalphabetic ciphers came more sophisticated aids such as Alberti's own cipher disk, Johannes Trithemius' tabula recta scheme, and Thomas Jefferson's wheel cypher (not publicly known, and reinvented independently by Bazeries around 1900). Many mechanical encryption/decryption devices were invented early in the 20th century, and several patented, among them rotor machines—famously including the Enigma machine used by the German government and military from the late 1920s and during World War II. The ciphers implemented by better quality examples of these machine designs brought about a substantial increase in cryptanalytic difficulty after WWI.
+
+=== Early computer-era cryptography ===
+Cryptanalysis of the new mechanical ciphering devices proved to be both difficult and laborious. In the United Kingdom, cryptanalytic efforts at Bletchley Park during WWII spurred the development of more efficient means for carrying out repetitive tasks, such as military code breaking (decryption). This culminated in the development of the Colossus, the world's first fully electronic, digital, programmable computer, which assisted in the decryption of ciphers generated by the German Army's Lorenz SZ40/42 machine.
+Extensive open academic research into cryptography is relatively recent, beginning in the mid-1970s. In the early 1970s IBM personnel designed the Data Encryption Standard (DES) algorithm that became the first federal government cryptography standard in the United States. In 1976 Whitfield Diffie and Martin Hellman published the Diffie–Hellman key exchange algorithm. In 1977 the RSA algorithm was published in Martin Gardner's Scientific American column. Since then, cryptography has become a widely used tool in communications, computer networks, and computer security generally.
+Some modern cryptographic techniques can only keep their keys secret if certain mathematical problems are intractable, such as the integer factorization or the discrete logarithm problems, so there are deep connections with abstract mathematics. There are very few cryptosystems that are proven to be unconditionally secure. The one-time pad is one, and was proven to be so by Claude Shannon. There are a few important algorithms that have been proven secure under certain assumptions. For example, the infeasibility of factoring extremely large integers is the basis for believing that RSA is secure, and some other systems, but even so, proof of unbreakability is unavailable since the underlying mathematical problem remains open. In practice, these are widely used, and are believed unbreakable in practice by most competent observers. There are systems similar to RSA, such as one by Michael O. Rabin that are provably secure provided factoring n = pq is impossible; it is quite unusable in practice. The discrete logarithm problem is the basis for believing some other cryptosystems are secure, and again, there are related, less practical systems that are provably secure relative to the solvability or insolvability discrete log problem.
+As well as being aware of cryptographic history, cryptographic algorithm and system designers must also sensibly consider probable future developments while working on their designs. For instance, continuous improvements in computer processing power have increased the scope of brute-force attacks, so when specifying key lengths, the required key lengths are similarly advancing. The potential impact of quantum computing are already being considered by some cryptographic system designers developing post-quantum cryptography. The announced imminence of small implementations of these machines may be making the need for preemptive caution rather more than merely speculative.
--- a/data/en.wikipedia.org/wiki/Cryptography-4.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-4.md
@ -0,0 +1,26 @@
+---
+title: "Cryptography"
+chunk: 5/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+== Modern cryptography ==
+Claude Shannon's two papers, his 1948 paper on information theory, and especially his 1949 paper on cryptography, laid the foundations of modern cryptography and provided a mathematical basis for future cryptography. His 1949 paper has been noted as having provided a "solid theoretical basis for cryptography and for cryptanalysis", and as having turned cryptography from an "art to a science". As a result of his contributions and work, he has been described as the "founding father of modern cryptography".
+Prior to the early 20th century, cryptography was mainly concerned with linguistic and lexicographic patterns. Since then cryptography has broadened in scope, and now makes extensive use of mathematical subdisciplines, including information theory, computational complexity, statistics, combinatorics, abstract algebra, number theory, and finite mathematics. Cryptography is also a branch of engineering, but an unusual one since it deals with active, intelligent, and malevolent opposition; other kinds of engineering (e.g., civil or chemical engineering) need deal only with neutral natural forces. There is also active research examining the relationship between cryptographic problems and quantum physics.
+Just as the development of digital computers and electronics helped in cryptanalysis, it made possible much more complex ciphers. Furthermore, computers allowed for the encryption of any kind of data representable in any binary format, unlike classical ciphers which only encrypted written language texts; this was new and significant. Computer use has thus supplanted linguistic cryptography, both for cipher design and cryptanalysis. Many computer ciphers can be characterized by their operation on binary bit sequences (sometimes in groups or blocks), unlike classical and mechanical schemes, which generally manipulate traditional characters (i.e., letters and digits) directly. However, computers have also assisted cryptanalysis, which has compensated to some extent for increased cipher complexity. Nonetheless, good modern ciphers have stayed ahead of cryptanalysis; it is typically the case that use of a quality cipher is very efficient (i.e., fast and requiring few resources, such as memory or CPU capability), while breaking it requires an effort many orders of magnitude larger, and vastly larger than that required for any classical cipher, making cryptanalysis so inefficient and impractical as to be effectively impossible.
+Research into post-quantum cryptography (PQC) has intensified because practical quantum computers would break widely deployed public-key systems such as RSA, Diffie–Hellman and ECC. A 2017 review in Nature surveys the leading PQC families—lattice-based, code-based, multivariate-quadratic and hash-based schemes—and stresses that standardisation and deployment should proceed well before large-scale quantum machines become available.
+
+=== Symmetric-key cryptography ===
+
+Symmetric-key cryptography refers to encryption methods in which both the sender and receiver share the same key (or, less commonly, in which their keys are different, but related in an easily computable way). This was the only kind of encryption publicly known until June 1976.
+
+Symmetric key ciphers are implemented as either block ciphers or stream ciphers. A block cipher enciphers input in blocks of plaintext as opposed to individual characters, the input form used by a stream cipher.
+The Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are block cipher designs that have been designated cryptography standards by the US government (though DES's designation was finally withdrawn after the AES was adopted). Despite its deprecation as an official standard, DES (especially its still-approved and much more secure triple-DES variant) remains quite popular; it is used across a wide range of applications, from ATM encryption to e-mail privacy and secure remote access. Many other block ciphers have been designed and released, with considerable variation in quality. Many, even some designed by capable practitioners, have been thoroughly broken, such as FEAL.
+Stream ciphers, in contrast to the 'block' type, create an arbitrarily long stream of key material, which is combined with the plaintext bit-by-bit or character-by-character, somewhat like the one-time pad. In a stream cipher, the output stream is created based on a hidden internal state that changes as the cipher operates. That internal state is initially set up using the secret key material. RC4 is a widely used stream cipher. Block ciphers can be used as stream ciphers by generating blocks of a keystream (in place of a Pseudorandom number generator) and applying an XOR operation to each bit of the plaintext with each bit of the keystream.
+Message authentication codes (MACs) are much like cryptographic hash functions, except that a secret key can be used to authenticate the hash value upon receipt; this additional complication blocks an attack scheme against bare digest algorithms, and so has been thought worth the effort. Cryptographic hash functions are a third type of cryptographic algorithm. They take a message of any length as input, and output a short, fixed-length hash, which can be used in (for example) a digital signature. For good hash functions, an attacker cannot find two messages that produce the same hash. MD4 is a long-used hash function that is now broken; MD5, a strengthened variant of MD4, is also widely used but broken in practice. The US National Security Agency developed the Secure Hash Algorithm series of MD5-like hash functions: SHA-0 was a flawed algorithm that the agency withdrew; SHA-1 is widely deployed and more secure than MD5, but cryptanalysts have identified attacks against it; the SHA-2 family improves on SHA-1, but is vulnerable to clashes as of 2011; and the US standards authority thought it "prudent" from a security perspective to develop a new standard to "significantly improve the robustness of NIST's overall hash algorithm toolkit." Thus, a hash function design competition was meant to select a new U.S. national standard, to be called SHA-3, by 2012. The competition ended on October 2, 2012, when the NIST announced that Keccak would be the new SHA-3 hash algorithm. Unlike block and stream ciphers that are invertible, cryptographic hash functions produce a hashed output that cannot be used to retrieve the original input data. Cryptographic hash functions are used to verify the authenticity of data retrieved from an untrusted source or to add a layer of security.
+
+=== Public-key cryptography ===
--- a/data/en.wikipedia.org/wiki/Cryptography-5.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-5.md
@ -0,0 +1,21 @@
+---
+title: "Cryptography"
+chunk: 6/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+Symmetric-key cryptosystems use the same key for encryption and decryption of a message, although a message or group of messages can have a different key than others. A significant disadvantage of symmetric ciphers is the key management necessary to use them securely. Each distinct pair of communicating parties must, ideally, share a different key, and perhaps for each ciphertext exchanged as well. The number of keys required increases as the square of the number of network members, which very quickly requires complex key management schemes to keep them all consistent and secret.
+
+In a groundbreaking 1976 paper, Whitfield Diffie and Martin Hellman proposed the notion of public-key (also, more generally, called asymmetric key) cryptography in which two different but mathematically related keys are used—a public key and a private key. A public key system is so constructed that calculation of one key (the 'private key') is computationally infeasible from the other (the 'public key'), even though they are necessarily related. Instead, both keys are generated secretly, as an interrelated pair. The historian David Kahn described public-key cryptography as "the most revolutionary new concept in the field since polyalphabetic substitution emerged in the Renaissance".
+In public-key cryptosystems, the public key may be freely distributed, while its paired private key must remain secret. The public key is used for encryption, while the private or secret key is used for decryption. While Diffie and Hellman could not find such a system, they showed that public-key cryptography was indeed possible by presenting the Diffie–Hellman key exchange protocol, a solution that is now widely used in secure communications to allow two parties to secretly agree on a shared encryption key.
+The X.509 standard defines the most commonly used format for public key certificates.
+Diffie and Hellman's publication sparked widespread academic efforts in finding a practical public-key encryption system. This race was finally won in 1978 by Ronald Rivest, Adi Shamir, and Len Adleman, whose solution has since become known as the RSA algorithm.
+The Diffie–Hellman and RSA algorithms, in addition to being the first publicly known examples of high-quality public-key algorithms, have been among the most widely used. Other asymmetric-key algorithms include the Cramer–Shoup cryptosystem, ElGamal encryption, and various elliptic curve techniques.
+A document published in 1997 by the Government Communications Headquarters (GCHQ), a British intelligence organization, revealed that cryptographers at GCHQ had anticipated several academic developments. Reportedly, around 1970, James H. Ellis had conceived the principles of asymmetric key cryptography. In 1973, Clifford Cocks invented a solution that was very similar in design rationale to RSA. In 1974, Malcolm J. Williamson is claimed to have developed the Diffie–Hellman key exchange.
+
+Public-key cryptography is also used for implementing digital signature schemes. A digital signature is reminiscent of an ordinary signature; they both have the characteristic of being easy for a user to produce, but difficult for anyone else to forge. Digital signatures can also be permanently tied to the content of the message being signed; they cannot then be 'moved' from one document to another, for any attempt will be detectable. In digital signature schemes, there are two algorithms: one for signing, in which a secret key is used to process the message (or a hash of the message, or both), and one for verification, in which the matching public key is used with the message to check the validity of the signature. RSA and DSA are two of the most popular digital signature schemes. Digital signatures are central to the operation of public key infrastructures and many network security schemes (e.g., SSL/TLS, many VPNs, etc.).
+Public-key algorithms are most often based on the computational complexity of "hard" problems, often from number theory. For example, the hardness of RSA is related to the integer factorization problem, while Diffie–Hellman and DSA are related to the discrete logarithm problem. The security of elliptic curve cryptography is based on number theoretic problems involving elliptic curves. Because of the difficulty of the underlying problems, most public-key algorithms involve operations such as modular multiplication and exponentiation, which are much more computationally expensive than the techniques used in most block ciphers, especially with typical key sizes. As a result, public-key cryptosystems are commonly hybrid cryptosystems, in which a fast high-quality symmetric-key encryption algorithm is used for the message itself, while the relevant symmetric key is sent with the message, but encrypted using a public-key algorithm. Similarly, hybrid signature schemes are often used, in which a cryptographic hash function is computed, and only the resulting hash is digitally signed.
--- a/data/en.wikipedia.org/wiki/Cryptography-6.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-6.md
@ -0,0 +1,14 @@
+---
+title: "Cryptography"
+chunk: 7/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+=== Cryptographic hash functions ===
+Cryptographic hash functions are functions that take a variable-length input and return a fixed-length output, which can be used in, for example, a digital signature. For a hash function to be secure, it must be difficult to compute two inputs that hash to the same value (collision resistance) and to compute an input that hashes to a given output (preimage resistance). MD4 is a long-used hash function that is now broken; MD5, a strengthened variant of MD4, is also widely used but broken in practice. The US National Security Agency developed the Secure Hash Algorithm series of MD5-like hash functions: SHA-0 was a flawed algorithm that the agency withdrew; SHA-1 is widely deployed and more secure than MD5, but cryptanalysts have identified attacks against it; the SHA-2 family improves on SHA-1, but is vulnerable to clashes as of 2011; and the US standards authority thought it "prudent" from a security perspective to develop a new standard to "significantly improve the robustness of NIST's overall hash algorithm toolkit." Thus, a hash function design competition was meant to select a new U.S. national standard, to be called SHA-3, by 2012. The competition ended on October 2, 2012, when the NIST announced that Keccak would be the new SHA-3 hash algorithm. Unlike block and stream ciphers that are invertible, cryptographic hash functions produce a hashed output that cannot be used to retrieve the original input data. Cryptographic hash functions are used to verify the authenticity of data retrieved from an untrusted source or to add a layer of security.
+
+=== Cryptanalysis ===
--- a/data/en.wikipedia.org/wiki/Cryptography-7.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-7.md
@ -0,0 +1,21 @@
+---
+title: "Cryptography"
+chunk: 8/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+The goal of cryptanalysis is to find some weakness or insecurity in a cryptographic scheme, thus permitting its subversion or evasion.
+It is a common misconception that every encryption method can be broken. In connection with his WWII work at Bell Labs, Claude Shannon proved that the one-time pad cipher is unbreakable, provided the key material is truly random, never reused, kept secret from all possible attackers, and of equal or greater length than the message. Most ciphers, apart from the one-time pad, can be broken with enough computational effort by brute force attack, but the amount of effort needed may be exponentially dependent on the key size, as compared to the effort needed to make use of the cipher. In such cases, effective security could be achieved if it is proven that the effort required (i.e., "work factor", in Shannon's terms) is beyond the ability of any adversary. This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher. Since no such proof has been found to date, the one-time-pad remains the only theoretically unbreakable cipher. Although well-implemented one-time-pad encryption cannot be broken, traffic analysis is still possible.
+There are a wide variety of cryptanalytic attacks, and they can be classified in any of several ways. A common distinction turns on what Eve (an attacker) knows and what capabilities are available. In a ciphertext-only attack, Eve has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks). In a known-plaintext attack, Eve has access to a ciphertext and its corresponding plaintext (or to many such pairs). In a chosen-plaintext attack, Eve may choose a plaintext and learn its corresponding ciphertext (perhaps many times); an example is gardening, used by the British during WWII. In a chosen-ciphertext attack, Eve may be able to choose ciphertexts and learn their corresponding plaintexts. Finally in a man-in-the-middle attack Eve gets in between Alice (the sender) and Bob (the recipient), accesses and modifies the traffic and then forward it to the recipient. Also important, often overwhelmingly so, are mistakes (generally in the design or use of one of the protocols involved).
+Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher. For example, a simple brute force attack against DES requires one known plaintext and 255 decryptions, trying approximately half of the possible keys, to reach a point at which chances are better than even that the key sought will have been found. But this may not be enough assurance; a linear cryptanalysis attack against DES requires 243 known plaintexts (with their corresponding ciphertexts) and approximately 243 DES operations. This is a considerable improvement over brute force attacks.
+Public-key algorithms are based on the computational difficulty of various problems. The most famous of these are the difficulty of integer factorization of semiprimes and the difficulty of calculating discrete logarithms, both of which are not yet proven to be solvable in polynomial time (P) using only a classical Turing-complete computer. Much public-key cryptanalysis concerns designing algorithms in P that can solve these problems, or using other technologies, such as quantum computers. For instance, the best-known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best-known algorithms for factoring, at least for problems of more or less equivalent size. Thus, to achieve an equivalent strength of encryption, techniques that depend upon the difficulty of factoring large composite numbers, such as the RSA cryptosystem, require larger keys than elliptic curve techniques. For this reason, public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s.
+While pure cryptanalysis uses weaknesses in the algorithms themselves, other attacks on cryptosystems are based on actual use of the algorithms in real devices, and are called side-channel attacks. If a cryptanalyst has access to, for example, the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character, they may be able to use a timing attack to break a cipher that is otherwise resistant to analysis. An attacker might also study the pattern and length of messages to derive valuable information; this is known as traffic analysis and can be quite useful to an alert adversary. Poor administration of a cryptosystem, such as permitting too short keys, will make any system vulnerable, regardless of other virtues. Social engineering and other attacks against humans (e.g., bribery, extortion, blackmail, espionage, rubber-hose cryptanalysis or torture) are usually employed due to being more cost-effective and feasible to perform in a reasonable amount of time compared to pure cryptanalysis by a high margin.
+
+=== Cryptographic primitives ===
+Much of the theoretical work in cryptography concerns cryptographic primitives—algorithms with basic cryptographic properties—and their relationship to other cryptographic problems. More complicated cryptographic tools are then built from these basic primitives. These primitives provide fundamental properties, which are used to develop more complex tools called cryptosystems or cryptographic protocols, which guarantee one or more high-level security properties. Note, however, that the distinction between cryptographic primitives and cryptosystems, is quite arbitrary; for example, the RSA algorithm is sometimes considered a cryptosystem, and sometimes a primitive. Typical examples of cryptographic primitives include pseudorandom functions, one-way functions, etc.
+
+=== Cryptosystems ===
--- a/data/en.wikipedia.org/wiki/Cryptography-8.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-8.md
@ -0,0 +1,39 @@
+---
+title: "Cryptography"
+chunk: 9/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+One or more cryptographic primitives are often used to develop a more complex algorithm, called a cryptographic system, or cryptosystem. Cryptosystems (e.g., El-Gamal encryption) are designed to provide particular functionality (e.g., public key encryption) while guaranteeing certain security properties (e.g., chosen-plaintext attack (CPA) security in the random oracle model). Cryptosystems use the properties of the underlying cryptographic primitives to support the system's security properties. As the distinction between primitives and cryptosystems is somewhat arbitrary, a sophisticated cryptosystem can be derived from a combination of several more primitive cryptosystems. In many cases, the cryptosystem's structure involves back and forth communication among two or more parties in space (e.g., between the sender of a secure message and its receiver) or across time (e.g., cryptographically protected backup data). Such cryptosystems are sometimes called cryptographic protocols.
+Some widely known cryptosystems include RSA, Schnorr signature, ElGamal encryption, and Pretty Good Privacy (PGP). More complex cryptosystems include electronic cash systems, signcryption systems, etc. Some more 'theoretical' cryptosystems include interactive proof systems, (like zero-knowledge proofs) and systems for secret sharing.
+
+=== Lightweight cryptography ===
+Lightweight cryptography (LWC) concerns cryptographic algorithms developed for a strictly constrained environment. The growth of Internet of Things (IoT) has spiked research into the development of lightweight algorithms that are better suited for the environment. An IoT environment requires strict constraints on power consumption, processing power, and security. Algorithms such as Ascon and SPECK are examples of the many LWC algorithms that have been developed to match the criteria for the CAESAR Competition and the standard set by the National Institute of Standards and Technology.
+
+== Applications ==
+
+Cryptography is widely used on the internet to help protect user-data and prevent eavesdropping. To ensure secrecy during transmission, many systems use private key cryptography to protect transmitted information. With public-key systems, one can maintain secrecy without a master key or a large number of keys. But, some algorithms like BitLocker and VeraCrypt are generally not private-public key cryptography. For example, Veracrypt uses a password hash to generate the single private key. However, it can be configured to run in public-private key systems. The C++ opensource encryption library OpenSSL provides free and opensource encryption software and tools. The most commonly used encryption cipher suit is AES, as it has hardware acceleration for all x86 based processors that has AES-NI. A close contender is ChaCha20-Poly1305, which is a stream cipher, however it is commonly used for mobile devices as they are ARM based which does not feature AES-NI instruction set extension.
+
+=== Cybersecurity ===
+
+Cryptography can be used to secure communications by encrypting them. Websites use encryption via HTTPS. "End-to-end" encryption, where only sender and receiver can read messages, is implemented for email in Pretty Good Privacy and for secure messaging in general in WhatsApp, Signal and Telegram.
+Operating systems use encryption to keep passwords secret, conceal parts of the system, and ensure that software updates are truly from the system maker. Instead of storing plaintext passwords, computer systems store hashes thereof; then, when a user logs in, the system passes the given password through a cryptographic hash function and compares it to the hashed value on file. In this manner, neither the system nor an attacker has at any point access to the password in plaintext.
+Encryption is sometimes used to encrypt one's entire drive. For example, University College London has implemented BitLocker (a program by Microsoft) to render drive data opaque without users logging in.
+
+=== Cryptocurrencies and cryptoeconomics ===
+Cryptographic techniques enable cryptocurrency technologies, such as distributed ledger technologies (e.g., blockchains), which finance cryptoeconomics applications such as decentralized finance (DeFi). Key cryptographic techniques that enable cryptocurrencies and cryptoeconomics include, but are not limited to: cryptographic keys, cryptographic hash function, asymmetric (public key) encryption, Multi-Factor Authentication (MFA), End-to-End Encryption (E2EE), and Zero Knowledge Proofs (ZKP).
+
+=== Quantum computing cybersecurity ===
+Estimates suggest that a quantum computer could reduce the effort required to break today’s strongest RSA or elliptic-curve keys from millennia to mere seconds, rendering current protocols (such as the versions of TLS that rely on those keys) insecure.
+To mitigate this "quantum threat", researchers are developing quantum-resistant algorithms whose security rests on problems believed to remain hard for both classical and quantum computers.
+
+== Legal issues ==
+
+=== Prohibitions ===
+Cryptography has long been of interest to intelligence gathering and law enforcement agencies. Secret communications may be criminal or even treasonous. Because of its facilitation of privacy, and the diminution of privacy attendant on its prohibition, cryptography is also of considerable interest to civil rights supporters. Accordingly, there has been a history of controversial legal issues surrounding cryptography, especially since the advent of inexpensive computers has made widespread access to high-quality cryptography possible.
+In some countries, even the domestic use of cryptography is, or has been, restricted. Until 1999, France significantly restricted the use of cryptography domestically, though it has since relaxed many of these rules. In China and Iran, a license is still required to use cryptography. Many countries have tight restrictions on the use of cryptography. Among the more restrictive are laws in Belarus, Kazakhstan, Mongolia, Pakistan, Singapore, Tunisia, and Vietnam.
+In the United States, cryptography is legal for domestic use, but there has been much conflict over legal issues related to cryptography. One particularly important issue has been the export of cryptography and cryptographic software and hardware. Probably because of the importance of cryptanalysis in World War II and an expectation that cryptography would continue to be important for national security, many Western governments have, at some point, strictly regulated export of cryptography. After World War II, it was illegal in the US to sell or distribute encryption technology overseas; in fact, encryption was designated as auxiliary military equipment and put on the United States Munitions List. Until the development of the personal computer, asymmetric key algorithms (i.e., public key techniques), and the Internet, this was not especially problematic. However, as the Internet grew and computers became more widely available, high-quality encryption techniques became well known around the globe.
--- a/data/en.wikipedia.org/wiki/Cryptography-9.md
+++ b/data/en.wikipedia.org/wiki/Cryptography-9.md
@ -0,0 +1,26 @@
+---
+title: "Cryptography"
+chunk: 10/11
+source: "https://en.wikipedia.org/wiki/Cryptography"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:56:23.628158+00:00"
+instance: "kb-cron"
+---
+
+=== Export controls ===
+
+In the 1990s, there were several challenges to US export regulation of cryptography. After the source code for Philip Zimmermann's Pretty Good Privacy (PGP) encryption program found its way onto the Internet in June 1991, a complaint by RSA Security (then called RSA Data Security, Inc.) resulted in a lengthy criminal investigation of Zimmermann by the US Customs Service and the FBI, though no charges were ever filed. Daniel J. Bernstein, then a graduate student at UC Berkeley, brought a lawsuit against the US government challenging some aspects of the restrictions based on free speech grounds. The 1995 case Bernstein v. United States ultimately resulted in a 1999 decision that printed source code for cryptographic algorithms and systems was protected as free speech by the United States Constitution.
+In 1996, thirty-nine countries signed the Wassenaar Arrangement, an arms control treaty that deals with the export of arms and "dual-use" technologies such as cryptography. The treaty stipulated that the use of cryptography with short key-lengths (56-bit for symmetric encryption, 512-bit for RSA) would no longer be export-controlled. Cryptography exports from the US became less strictly regulated as a consequence of a major relaxation in 2000; there are no longer very many restrictions on key sizes in US-exported mass-market software. Since this relaxation in US export restrictions, and because most personal computers connected to the Internet include US-sourced web browsers such as Firefox or Internet Explorer, almost every Internet user worldwide has potential access to quality cryptography via their browsers (e.g., via Transport Layer Security). The Mozilla Thunderbird and Microsoft Outlook E-mail client programs similarly can transmit and receive emails via TLS, and can send and receive email encrypted with S/MIME. Many Internet users do not realize that their basic application software contains such extensive cryptosystems. These browsers and email programs are so ubiquitous that even governments whose intent is to regulate civilian use of cryptography generally do not find it practical to do much to control distribution or use of cryptography of this quality, so even when such laws are in force, actual enforcement is often effectively impossible.
+
+=== NSA involvement ===
+
+Another contentious issue connected to cryptography in the United States is the influence of the National Security Agency on cipher development and policy. The NSA was involved with the design of DES during its development at IBM and its consideration by the National Bureau of Standards as a possible Federal Standard for cryptography. DES was designed to be resistant to differential cryptanalysis, a powerful and general cryptanalytic technique known to the NSA and IBM, that became publicly known only when it was rediscovered in the late 1980s. According to Steven Levy, IBM discovered differential cryptanalysis, but kept the technique secret at the NSA's request. The technique became publicly known only when Biham and Shamir re-discovered and announced it some years later. The entire affair illustrates the difficulty of determining what resources and knowledge an attacker might actually have.
+Another instance of the NSA's involvement was the 1993 Clipper chip affair, an encryption microchip intended to be part of the Capstone cryptography-control initiative. Clipper was widely criticized by cryptographers for two reasons. The cipher algorithm (called Skipjack) was then classified (declassified in 1998, long after the Clipper initiative lapsed). The classified cipher caused concerns that the NSA had deliberately made the cipher weak to assist its intelligence efforts. The whole initiative was also criticized based on its violation of Kerckhoffs's Principle, as the scheme included a special escrow key held by the government for use by law enforcement (i.e. wiretapping).
+
+=== Digital rights management ===
+
+Cryptography is central to digital rights management (DRM), a group of techniques for technologically controlling use of copyrighted material, being widely implemented and deployed at the behest of some copyright holders. In 1998, U.S. President Bill Clinton signed the Digital Millennium Copyright Act (DMCA), which criminalized all production, dissemination, and use of certain cryptanalytic techniques and technology (now known or later discovered); specifically, those that could be used to circumvent DRM technological schemes. This had a noticeable impact on the cryptography research community since an argument can be made that any cryptanalytic research violated the DMCA. Similar statutes have since been enacted in several countries and regions, including the implementation in the EU Copyright Directive. Similar restrictions are called for by treaties signed by World Intellectual Property Organization member-states.
+The United States Department of Justice and FBI have not enforced the DMCA as rigorously as had been feared by some, but the law, nonetheless, remains a controversial one. Niels Ferguson, a well-respected cryptography researcher, has publicly stated that he will not release some of his research into an Intel security design for fear of prosecution under the DMCA. Cryptologist Bruce Schneier has argued that the DMCA encourages vendor lock-in, while inhibiting actual measures toward cyber-security. Both Alan Cox (longtime Linux kernel developer) and Edward Felten (and some of his students at Princeton) have encountered problems related to the Act. Dmitry Sklyarov was arrested during a visit to the US from Russia, and jailed for five months pending trial for alleged violations of the DMCA arising from work he had done in Russia, where the work was legal. In 2007, the cryptographic keys responsible for Blu-ray and HD DVD content scrambling were discovered and released onto the Internet. In both cases, the Motion Picture Association of America sent out numerous DMCA takedown notices, and there was a massive Internet backlash triggered by the perceived impact of such notices on fair use and free speech.
+
+=== Forced disclosure of encryption keys ===