kb/Dataset_shift-0.md at 9b747ec7958babdff64d9dac8e8df5188dd3f184

turtle89431 5f7a7ab0ae Scrape wikipedia-science: 6185 new, 3208 updated, 9666 total (kb-cron)

2026-05-05 02:54:40 -07:00

2.0 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Dataset shift	1/1	https://en.wikipedia.org/wiki/Dataset_shift	reference	science, encyclopedia	2026-05-05T09:53:59.453907+00:00	kb-cron

Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e.,

      P
      
        t
        r
        a
        i
        n
      
    
    (
    X
    ,
    Y
    )
    ≠
    
      P
      
        t
        e
        s
        t
      
    
    (
    X
    ,
    Y
    )
  

{\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)}

). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.

== See also == Concept drift Domain adaptation Overfitting Statistical classification

== References ==

2.0 KiB Raw Blame History

2.0 KiB

Raw Blame History