---
title: "Asymptotic equipartition property"
chunk: 2/3
source: "https://en.wikipedia.org/wiki/Asymptotic_equipartition_property"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T14:39:53.761478+00:00"
instance: "kb-cron"
---

  
    
      
        
          lim
          
            n
            →
            ∞
          
        
        Pr
        
          [
          
            
            
              |
              
                −
                
                  
                    1
                    n
                  
                
                log
                ⁡
                p
                (
                
                  X
                  
                    1
                  
                
                ,
                
                  X
                  
                    2
                  
                
                ,
                …
                ,
                
                  X
                  
                    n
                  
                
                )
                −
                
                  
                    
                      H
                      ¯
                    
                  
                  
                    n
                  
                
                (
                X
                )
              
              |
            
            <
            ε
          
          ]
        
        =
        1
        
        ∀
        ε
        >
        0
      
    
    {\displaystyle \lim _{n\to \infty }\Pr \left[\,\left|-{\frac {1}{n}}\log p(X_{1},X_{2},\ldots ,X_{n})-{\overline {H}}_{n}(X)\right|<\varepsilon \right]=1\qquad \forall \varepsilon >0}
  

where 

  
    
      
        
          
            
              H
              ¯
            
          
          
            n
          
        
        (
        X
        )
        =
        
          
            1
            n
          
        
        H
        (
        
          X
          
            1
          
        
        ,
        
          X
          
            2
          
        
        ,
        …
        ,
        
          X
          
            n
          
        
        )
      
    
    {\displaystyle {\overline {H}}_{n}(X)={\frac {1}{n}}H(X_{1},X_{2},\ldots ,X_{n})}
  

=== Applications ===
The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the source coding theorem for non-stationary source (with independent output symbols) and noisy-channel coding theorem for non-stationary memoryless channels.

== Measure-theoretic form ==

  
    
      
        T
      
    
    {\textstyle T}
  
 is a measure-preserving map on the probability space 
  
    
      
        Ω
      
    
    {\textstyle \Omega }
  
.
If 
  
    
      
        P
      
    
    {\textstyle P}
  
 is a finite or countable partition of 
  
    
      
        Ω
      
    
    {\textstyle \Omega }
  
, then its entropy is 
  
    
      
        H
        (
        P
        )
        :=
        −
        
          ∑
          
            p
            ∈
            P
          
        
        μ
        (
        p
        )
        ln
        ⁡
        μ
        (
        p
        )
      
    
    {\displaystyle H(P):=-\sum _{p\in P}\mu (p)\ln \mu (p)}
  
 with the convention that 
  
    
      
        0
        ln
        ⁡
        0
        =
        0
      
    
    {\displaystyle 0\ln 0=0}
  
.
We only consider partitions with finite entropy: 
  
    
      
        H
        (
        P
        )
        <
        ∞
      
    
    {\textstyle H(P)<\infty }
  
.
If 
  
    
      
        P
      
    
    {\textstyle P}
  
 is a finite or countable partition of 
  
    
      
        Ω
      
    
    {\textstyle \Omega }
  
, then we construct a sequence of partitions by iterating the map:
  
    
      
        
          P
          
            (
            n
            )
          
        
        :=
        P
        ∨
        
          T
          
            −
            1
          
        
        P
        ∨
        ⋯
        ∨
        
          T
          
            −
            (
            n
            −
            1
            )
          
        
        P
      
    
    {\displaystyle P^{(n)}:=P\vee T^{-1}P\vee \dots \vee T^{-(n-1)}P}
  
where 
  
    
      
        P
        ∨
        Q
      
    
    {\textstyle P\vee Q}
  
 is the least upper bound partition, that is, the least refined partition that refines both 
  
    
      
        P
      
    
    {\textstyle P}
  
 and 
  
    
      
        Q
      
    
    {\textstyle Q}
  
:
  
    
      
        P
        ∨
        Q
        :=
        {
        p
        ∩
        q
        :
        p
        ∈
        P
        ,
        q
        ∈
        Q
        }
      
    
    {\displaystyle P\vee Q:=\{p\cap q:p\in P,q\in Q\}}
  
Write 
  
    
      
        P
        (
        x
        )
      
    
    {\textstyle P(x)}
  
 to be the set in 
  
    
      
        P
      
    
    {\textstyle P}
  
 where 
  
    
      
        x
      
    
    {\textstyle x}
  
 falls in. So, for example, 
  
    
      
        
          P
          
            (
            n
            )
          
        
        (
        x
        )
      
    
    {\textstyle P^{(n)}(x)}
  
 is the 
  
    
      
        n
      
    
    {\textstyle n}
  
-letter initial segment of the 
  
    
      
        (
        P
        ,
        T
        )
      
    
    {\textstyle (P,T)}
  
 name of 
  
    
      
        x
      
    
    {\textstyle x}
  
.
Write 
  
    
      
        
          I
          
            P
          
        
        (
        x
        )
      
    
    {\textstyle I_{P}(x)}
  
 to be the information (in units of nats) about 
  
    
      
        x
      
    
    {\textstyle x}
  
 we can recover, if we know which element in the partition 
  
    
      
        P
      
    
    {\textstyle P}
  
 that 
  
    
      
        x
      
    
    {\textstyle x}
  
 falls in:
  
    
      
        
          I
          
            P
          
        
        :=
        −
        ln
        ⁡
        μ
        (
        P
        (
        x
        )
        )
      
    
    {\displaystyle I_{P}:=-\ln \mu (P(x))}
  
Similarly, the conditional information of partition 
  
    
      
        P
      
    
    {\textstyle P}
  
, conditional on partition 
  
    
      
        Q
      
    
    {\textstyle Q}
  
, about 
  
    
      
        x
      
    
    {\textstyle x}
  
, is
  
    
      
        
          I
          
            P
            
              |
            
            Q
          
        
        (
        x
        )
        :=
        −
        ln
        ⁡
        
          
            
              P
              ∨
              Q
              (
              x
              )
            
            
              Q
              (
              x
              )
            
          
        
      
    
    {\displaystyle I_{P|Q}(x):=-\ln {\frac {P\vee Q(x)}{Q(x)}}}
  

  
    
      
        
          h
          
            T
          
        
        (
        P
        )
      
    
    {\textstyle h_{T}(P)}
  
 is the Kolmogorov-Sinai entropy
  
    
      
        
          h
          
            T
          
        
        (
        P
        )
        :=
        
          lim
          
            n
          
        
        
          
            1
            n
          
        
        H
        (
        
          P
          
            (
            n
            )
          
        
        )
        =
        
          lim
          
            n
          
        
        
          E
          
            x
            ∼
            μ
          
        
        
          [
          
            
              
                1
                n
              
            
            
              I
              
                
                  P
                  
                    (
                    n
                    )
                  
                
              
            
            (
            x
            )
          
          ]
        
      
    
    {\displaystyle h_{T}(P):=\lim _{n}{\frac {1}{n}}H(P^{(n)})=\lim _{n}E_{x\sim \mu }\left[{\frac {1}{n}}I_{P^{(n)}}(x)\right]}
  
In other words, by definition, there is a convergence in expectation. The SMB theorem states that when 
  
    
      
        T
      
    
    {\textstyle T}
  
 is ergodic, there is convergence in L1.

If 
  
    
      
        T
      
    
    {\textstyle T}
  
 is not necessarily ergodic, then the underlying probability space would be split up into multiple subsets, each invariant under 
  
    
      
        T
      
    
    {\textstyle T}
  
. In this case, we still have L1 convergence to some function, but that function is no longer a constant function.

When 
  
    
      
        T
      
    
    {\textstyle T}
  
 is ergodic, 
  
    
      
        
          
            I
          
        
      
    
    {\textstyle {\mathcal {I}}}
  
 is trivial, and so the function
  
    
      
        x
        ↦
        E
        
          [
          
            
              lim
              
                n
              
            
            
              I
              
                P
                
                  |
                
                
                  ∨
                  
                    k
                    =
                    1
                  
                  
                    n
                  
                
                
                  T
                  
                    −
                    k
                  
                
                P
              
            
            
              
                |
              
            
            
            
              
                I
              
            
          
          ]
        
      
    
    {\displaystyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}{\big |}\;{\mathcal {I}}\right]}
  
simplifies into the constant function 
  
    
      
        x
        ↦
        E
        
          [
          
            
              lim
              
                n
              
            
            
              I
              
                P
                
                  |
                
                
                  ∨
                  
                    k
                    =
                    1
                  
                  
                    n
                  
                
                
                  T
                  
                    −
                    k
                  
                
                P
              
            
          
          ]
        
      
    
    {\textstyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}\right]}
  
, which by definition, equals 
  
    
      
        
          lim
          
            n
          
        
        H
        (
        P
        
          |
        
        
          ∨
          
            k
            =
            1
          
          
            n
          
        
        
          T
          
            −
            k
          
        
        P
        )
      
    
    {\textstyle \lim _{n}H(P|\vee _{k=1}^{n}T^{-k}P)}
  
, which equals 
  
    
      
        
          h
          
            T
          
        
        (
        P
        )
      
    
    {\textstyle h_{T}(P)}
  
 by a proposition.

== Continuous-time stationary ergodic sources ==
Discrete-time functions can be interpolated to continuous-time functions. If such interpolation f is measurable, we may define the continuous-time stationary process accordingly as 
  
    
      
        
          
            
              X
              ~
            
          
        
        :=
        f
        ∘
        X
      
    
    {\displaystyle {\tilde {X}}:=f\circ X}
  
. If the asymptotic equipartition property holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.

  
    
      
        −
        
          
            1
            n
          
        
        log
        ⁡
        p
        (
        
          
            
              
                X
                ~
              
            
          
          
            0
          
          
            τ
          
        
        )
        →
        H
        (
        X
        )
      
    
    {\displaystyle -{\frac {1}{n}}\log p({\tilde {X}}_{0}^{\tau })\to H(X)}
  

where n corresponds to the degree of freedom in time τ. nH(X)/τ and H(X) are the entropy per unit time and per degree of freedom respectively, defined by Shannon.
An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous 
  
    
      
        
          
            
              L
            
          
          
            2
          
        
      
    
    {\displaystyle {\mathcal {L}}_{2}}
  
 functions. The asymptotic equipartition property holds if the process is white, in which case the time samples are i.i.d., or there exists T > 1/2W, where W is the nominal bandwidth, such that the T-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process.
Any time-invariant operations also preserves the asymptotic equipartition property, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process.