kb/data/en.wikipedia.org/wiki/Dataset_shift-0.md

2.0 KiB

title chunk source category tags date_saved instance
Dataset shift 1/1 https://en.wikipedia.org/wiki/Dataset_shift reference science, encyclopedia 2026-05-05T09:53:59.453907+00:00 kb-cron

Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e.,

      P
      
        t
        r
        a
        i
        n
      
    
    (
    X
    ,
    Y
    )
    ≠
    
      P
      
        t
        e
        s
        t
      
    
    (
    X
    ,
    Y
    )
  

{\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)}

). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.

== See also == Concept drift Domain adaptation Overfitting Statistical classification

== References ==