402 lines
4.7 KiB
Markdown
402 lines
4.7 KiB
Markdown
---
|
||
title: "Data processing inequality"
|
||
chunk: 1/1
|
||
source: "https://en.wikipedia.org/wiki/Data_processing_inequality"
|
||
category: "reference"
|
||
tags: "science, encyclopedia"
|
||
date_saved: "2026-05-05T11:32:29.558623+00:00"
|
||
instance: "kb-cron"
|
||
---
|
||
|
||
The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.
|
||
|
||
|
||
== Statement ==
|
||
Let three random variables form the Markov chain
|
||
|
||
|
||
|
||
X
|
||
→
|
||
Y
|
||
→
|
||
Z
|
||
|
||
|
||
{\displaystyle X\rightarrow Y\rightarrow Z}
|
||
|
||
, implying that the conditional distribution of
|
||
|
||
|
||
|
||
Z
|
||
|
||
|
||
{\displaystyle Z}
|
||
|
||
depends only on
|
||
|
||
|
||
|
||
Y
|
||
|
||
|
||
{\displaystyle Y}
|
||
|
||
and is conditionally independent of
|
||
|
||
|
||
|
||
X
|
||
|
||
|
||
{\displaystyle X}
|
||
|
||
. Specifically, we have such a Markov chain if the joint probability mass function can be written as
|
||
|
||
|
||
|
||
|
||
p
|
||
(
|
||
x
|
||
,
|
||
y
|
||
,
|
||
z
|
||
)
|
||
=
|
||
p
|
||
(
|
||
x
|
||
)
|
||
p
|
||
(
|
||
y
|
||
|
||
|
|
||
|
||
x
|
||
)
|
||
p
|
||
(
|
||
z
|
||
|
||
|
|
||
|
||
y
|
||
)
|
||
=
|
||
p
|
||
(
|
||
y
|
||
)
|
||
p
|
||
(
|
||
x
|
||
|
||
|
|
||
|
||
y
|
||
)
|
||
p
|
||
(
|
||
z
|
||
|
||
|
|
||
|
||
y
|
||
)
|
||
|
||
|
||
{\displaystyle p(x,y,z)=p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)}
|
||
|
||
|
||
In this setting, no processing of
|
||
|
||
|
||
|
||
Y
|
||
|
||
|
||
{\displaystyle Y}
|
||
|
||
, deterministic or random, can increase the information that
|
||
|
||
|
||
|
||
Y
|
||
|
||
|
||
{\displaystyle Y}
|
||
|
||
contains about
|
||
|
||
|
||
|
||
X
|
||
|
||
|
||
{\displaystyle X}
|
||
|
||
. Using the mutual information, this can be written as :
|
||
|
||
|
||
|
||
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
)
|
||
⩾
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Z
|
||
)
|
||
,
|
||
|
||
|
||
{\displaystyle I(X;Y)\geqslant I(X;Z),}
|
||
|
||
|
||
with the equality
|
||
|
||
|
||
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
)
|
||
=
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Z
|
||
)
|
||
|
||
|
||
{\displaystyle I(X;Y)=I(X;Z)}
|
||
|
||
if and only if
|
||
|
||
|
||
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
∣
|
||
Z
|
||
)
|
||
=
|
||
0
|
||
|
||
|
||
{\displaystyle I(X;Y\mid Z)=0}
|
||
|
||
. That is,
|
||
|
||
|
||
|
||
Z
|
||
|
||
|
||
{\displaystyle Z}
|
||
|
||
and
|
||
|
||
|
||
|
||
Y
|
||
|
||
|
||
{\displaystyle Y}
|
||
|
||
contain the same information about
|
||
|
||
|
||
|
||
X
|
||
|
||
|
||
{\displaystyle X}
|
||
|
||
, and
|
||
|
||
|
||
|
||
X
|
||
→
|
||
Z
|
||
→
|
||
Y
|
||
|
||
|
||
{\displaystyle X\rightarrow Z\rightarrow Y}
|
||
|
||
also forms a Markov chain.
|
||
|
||
|
||
== Proof ==
|
||
One can apply the chain rule for mutual information to obtain two different decompositions of
|
||
|
||
|
||
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
,
|
||
Z
|
||
)
|
||
|
||
|
||
{\displaystyle I(X;Y,Z)}
|
||
|
||
:
|
||
|
||
|
||
|
||
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Z
|
||
)
|
||
+
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
∣
|
||
Z
|
||
)
|
||
=
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
,
|
||
Z
|
||
)
|
||
=
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
)
|
||
+
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Z
|
||
∣
|
||
Y
|
||
)
|
||
|
||
|
||
{\displaystyle I(X;Z)+I(X;Y\mid Z)=I(X;Y,Z)=I(X;Y)+I(X;Z\mid Y)}
|
||
|
||
|
||
By the relationship
|
||
|
||
|
||
|
||
X
|
||
→
|
||
Y
|
||
→
|
||
Z
|
||
|
||
|
||
{\displaystyle X\rightarrow Y\rightarrow Z}
|
||
|
||
, we know that
|
||
|
||
|
||
|
||
X
|
||
|
||
|
||
{\displaystyle X}
|
||
|
||
and
|
||
|
||
|
||
|
||
Z
|
||
|
||
|
||
{\displaystyle Z}
|
||
|
||
are conditionally independent, given
|
||
|
||
|
||
|
||
Y
|
||
|
||
|
||
{\displaystyle Y}
|
||
|
||
, which means the conditional mutual information,
|
||
|
||
|
||
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Z
|
||
∣
|
||
Y
|
||
)
|
||
=
|
||
0
|
||
|
||
|
||
{\displaystyle I(X;Z\mid Y)=0}
|
||
|
||
. The data processing inequality then follows from the non-negativity of
|
||
|
||
|
||
|
||
I
|
||
(
|
||
X
|
||
;
|
||
Y
|
||
∣
|
||
Z
|
||
)
|
||
≥
|
||
0
|
||
|
||
|
||
{\displaystyle I(X;Y\mid Z)\geq 0}
|
||
|
||
.
|
||
|
||
|
||
== See also ==
|
||
Garbage in, garbage out
|
||
|
||
|
||
== References ==
|
||
|
||
|
||
== External links ==
|
||
http://www.scholarpedia.org/article/Mutual_information |