kb/data/en.wikipedia.org/wiki/Data_processing_inequality-0.md

4.7 KiB
Raw Blame History

title chunk source category tags date_saved instance
Data processing inequality 1/1 https://en.wikipedia.org/wiki/Data_processing_inequality reference science, encyclopedia 2026-05-05T11:32:29.558623+00:00 kb-cron

The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.

== Statement == Let three random variables form the Markov chain

    X
    →
    Y
    →
    Z
  

{\displaystyle X\rightarrow Y\rightarrow Z}

, implying that the conditional distribution of

    Z
  

{\displaystyle Z}

depends only on

    Y
  

{\displaystyle Y}

and is conditionally independent of

    X
  

{\displaystyle X}

. Specifically, we have such a Markov chain if the joint probability mass function can be written as

    p
    (
    x
    ,
    y
    ,
    z
    )
    =
    p
    (
    x
    )
    p
    (
    y
    
      |
    
    x
    )
    p
    (
    z
    
      |
    
    y
    )
    =
    p
    (
    y
    )
    p
    (
    x
    
      |
    
    y
    )
    p
    (
    z
    
      |
    
    y
    )
  

{\displaystyle p(x,y,z)=p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)}

In this setting, no processing of

    Y
  

{\displaystyle Y}

, deterministic or random, can increase the information that

    Y
  

{\displaystyle Y}

contains about

    X
  

{\displaystyle X}

. Using the mutual information, this can be written as :

    I
    (
    X
    ;
    Y
    )
    ⩾
    I
    (
    X
    ;
    Z
    )
    ,
  

{\displaystyle I(X;Y)\geqslant I(X;Z),}

with the equality

    I
    (
    X
    ;
    Y
    )
    =
    I
    (
    X
    ;
    Z
    )
  

{\displaystyle I(X;Y)=I(X;Z)}

if and only if

    I
    (
    X
    ;
    Y
    
    Z
    )
    =
    0
  

{\displaystyle I(X;Y\mid Z)=0}

. That is,

    Z
  

{\displaystyle Z}

and

    Y
  

{\displaystyle Y}

contain the same information about

    X
  

{\displaystyle X}

, and

    X
    →
    Z
    →
    Y
  

{\displaystyle X\rightarrow Z\rightarrow Y}

also forms a Markov chain.

== Proof == One can apply the chain rule for mutual information to obtain two different decompositions of

    I
    (
    X
    ;
    Y
    ,
    Z
    )
  

{\displaystyle I(X;Y,Z)}

:

    I
    (
    X
    ;
    Z
    )
    +
    I
    (
    X
    ;
    Y
    
    Z
    )
    =
    I
    (
    X
    ;
    Y
    ,
    Z
    )
    =
    I
    (
    X
    ;
    Y
    )
    +
    I
    (
    X
    ;
    Z
    
    Y
    )
  

{\displaystyle I(X;Z)+I(X;Y\mid Z)=I(X;Y,Z)=I(X;Y)+I(X;Z\mid Y)}

By the relationship

    X
    →
    Y
    →
    Z
  

{\displaystyle X\rightarrow Y\rightarrow Z}

, we know that

    X
  

{\displaystyle X}

and

    Z
  

{\displaystyle Z}

are conditionally independent, given

    Y
  

{\displaystyle Y}

, which means the conditional mutual information,

    I
    (
    X
    ;
    Z
    
    Y
    )
    =
    0
  

{\displaystyle I(X;Z\mid Y)=0}

. The data processing inequality then follows from the non-negativity of

    I
    (
    X
    ;
    Y
    
    Z
    )
    ≥
    0
  

{\displaystyle I(X;Y\mid Z)\geq 0}

.

== See also == Garbage in, garbage out

== References ==

== External links == http://www.scholarpedia.org/article/Mutual_information