kb/data/en.wikipedia.org/wiki/Asymptotic_equipartition_property-1.md

14 KiB
Raw Blame History

title chunk source category tags date_saved instance
Asymptotic equipartition property 2/3 https://en.wikipedia.org/wiki/Asymptotic_equipartition_property reference science, encyclopedia 2026-05-05T14:39:53.761478+00:00 kb-cron
      lim
      
        n
        →
        ∞
      
    
    Pr
    
      [
      
        
        
          |
          
            
            
              
                1
                n
              
            
            log
            
            p
            (
            
              X
              
                1
              
            
            ,
            
              X
              
                2
              
            
            ,
            …
            ,
            
              X
              
                n
              
            
            )
            
            
              
                
                  H
                  ¯
                
              
              
                n
              
            
            (
            X
            )
          
          |
        
        <
        ε
      
      ]
    
    =
    1
    
    ∀
    ε
    >
    0
  

{\displaystyle \lim _{n\to \infty }\Pr \left[\,\left|-{\frac {1}{n}}\log p(X_{1},X_{2},\ldots ,X_{n})-{\overline {H}}_{n}(X)\right|<\varepsilon \right]=1\qquad \forall \varepsilon >0}

where

          H
          ¯
        
      
      
        n
      
    
    (
    X
    )
    =
    
      
        1
        n
      
    
    H
    (
    
      X
      
        1
      
    
    ,
    
      X
      
        2
      
    
    ,
    …
    ,
    
      X
      
        n
      
    
    )
  

{\displaystyle {\overline {H}}_{n}(X)={\frac {1}{n}}H(X_{1},X_{2},\ldots ,X_{n})}

=== Applications === The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the source coding theorem for non-stationary source (with independent output symbols) and noisy-channel coding theorem for non-stationary memoryless channels.

== Measure-theoretic form ==

    T
  

{\textstyle T}

is a measure-preserving map on the probability space

    Ω
  

{\textstyle \Omega }

. If

    P
  

{\textstyle P}

is a finite or countable partition of

    Ω
  

{\textstyle \Omega }

, then its entropy is

    H
    (
    P
    )
    :=
    
    
      ∑
      
        p
        ∈
        P
      
    
    μ
    (
    p
    )
    ln
    
    μ
    (
    p
    )
  

{\displaystyle H(P):=-\sum _{p\in P}\mu (p)\ln \mu (p)}

with the convention that

    0
    ln
    
    0
    =
    0
  

{\displaystyle 0\ln 0=0}

. We only consider partitions with finite entropy:

    H
    (
    P
    )
    <
    ∞
  

{\textstyle H(P)<\infty }

. If

    P
  

{\textstyle P}

is a finite or countable partition of

    Ω
  

{\textstyle \Omega }

, then we construct a sequence of partitions by iterating the map:

      P
      
        (
        n
        )
      
    
    :=
    P
    
    
      T
      
        
        1
      
    
    P
    
    
      T
      
        
        (
        n
        
        1
        )
      
    
    P
  

{\displaystyle P^{(n)}:=P\vee T^{-1}P\vee \dots \vee T^{-(n-1)}P}

where

    P
    
    Q
  

{\textstyle P\vee Q}

is the least upper bound partition, that is, the least refined partition that refines both

    P
  

{\textstyle P}

and

    Q
  

{\textstyle Q}

:

    P
    
    Q
    :=
    {
    p
    ∩
    q
    :
    p
    ∈
    P
    ,
    q
    ∈
    Q
    }
  

{\displaystyle P\vee Q:=\{p\cap q:p\in P,q\in Q\}}

Write

    P
    (
    x
    )
  

{\textstyle P(x)}

to be the set in

    P
  

{\textstyle P}

where

    x
  

{\textstyle x}

falls in. So, for example,

      P
      
        (
        n
        )
      
    
    (
    x
    )
  

{\textstyle P^{(n)}(x)}

is the

    n
  

{\textstyle n}

-letter initial segment of the

    (
    P
    ,
    T
    )
  

{\textstyle (P,T)}

name of

    x
  

{\textstyle x}

. Write

      I
      
        P
      
    
    (
    x
    )
  

{\textstyle I_{P}(x)}

to be the information (in units of nats) about

    x
  

{\textstyle x}

we can recover, if we know which element in the partition

    P
  

{\textstyle P}

that

    x
  

{\textstyle x}

falls in:

      I
      
        P
      
    
    :=
    
    ln
    
    μ
    (
    P
    (
    x
    )
    )
  

{\displaystyle I_{P}:=-\ln \mu (P(x))}

Similarly, the conditional information of partition

    P
  

{\textstyle P}

, conditional on partition

    Q
  

{\textstyle Q}

, about

    x
  

{\textstyle x}

, is

      I
      
        P
        
          |
        
        Q
      
    
    (
    x
    )
    :=
    
    ln
    
    
      
        
          P
          
          Q
          (
          x
          )
        
        
          Q
          (
          x
          )
        
      
    
  

{\displaystyle I_{P|Q}(x):=-\ln {\frac {P\vee Q(x)}{Q(x)}}}




  
    
      h
      
        T
      
    
    (
    P
    )
  

{\textstyle h_{T}(P)}

is the Kolmogorov-Sinai entropy

      h
      
        T
      
    
    (
    P
    )
    :=
    
      lim
      
        n
      
    
    
      
        1
        n
      
    
    H
    (
    
      P
      
        (
        n
        )
      
    
    )
    =
    
      lim
      
        n
      
    
    
      E
      
        x
        
        μ
      
    
    
      [
      
        
          
            1
            n
          
        
        
          I
          
            
              P
              
                (
                n
                )
              
            
          
        
        (
        x
        )
      
      ]
    
  

{\displaystyle h_{T}(P):=\lim _{n}{\frac {1}{n}}H(P^{(n)})=\lim _{n}E_{x\sim \mu }\left[{\frac {1}{n}}I_{P^{(n)}}(x)\right]}

In other words, by definition, there is a convergence in expectation. The SMB theorem states that when

    T
  

{\textstyle T}

is ergodic, there is convergence in L1.

If

    T
  

{\textstyle T}

is not necessarily ergodic, then the underlying probability space would be split up into multiple subsets, each invariant under

    T
  

{\textstyle T}

. In this case, we still have L1 convergence to some function, but that function is no longer a constant function.

When

    T
  

{\textstyle T}

is ergodic,

        I
      
    
  

{\textstyle {\mathcal {I}}}

is trivial, and so the function

    x
    ↦
    E
    
      [
      
        
          lim
          
            n
          
        
        
          I
          
            P
            
              |
            
            
              
              
                k
                =
                1
              
              
                n
              
            
            
              T
              
                
                k
              
            
            P
          
        
        
          
            |
          
        
        
        
          
            I
          
        
      
      ]
    
  

{\displaystyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}{\big |}\;{\mathcal {I}}\right]}

simplifies into the constant function

    x
    ↦
    E
    
      [
      
        
          lim
          
            n
          
        
        
          I
          
            P
            
              |
            
            
              
              
                k
                =
                1
              
              
                n
              
            
            
              T
              
                
                k
              
            
            P
          
        
      
      ]
    
  

{\textstyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}\right]}

, which by definition, equals

      lim
      
        n
      
    
    H
    (
    P
    
      |
    
    
      
      
        k
        =
        1
      
      
        n
      
    
    
      T
      
        
        k
      
    
    P
    )
  

{\textstyle \lim _{n}H(P|\vee _{k=1}^{n}T^{-k}P)}

, which equals

      h
      
        T
      
    
    (
    P
    )
  

{\textstyle h_{T}(P)}

by a proposition.

== Continuous-time stationary ergodic sources == Discrete-time functions can be interpolated to continuous-time functions. If such interpolation f is measurable, we may define the continuous-time stationary process accordingly as

          X
          ~
        
      
    
    :=
    f
    ∘
    X
  

{\displaystyle {\tilde {X}}:=f\circ X}

. If the asymptotic equipartition property holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.

    
    
      
        1
        n
      
    
    log
    
    p
    (
    
      
        
          
            X
            ~
          
        
      
      
        0
      
      
        τ
      
    
    )
    →
    H
    (
    X
    )
  

{\displaystyle -{\frac {1}{n}}\log p({\tilde {X}}_{0}^{\tau })\to H(X)}

where n corresponds to the degree of freedom in time τ. nH(X)/τ and H(X) are the entropy per unit time and per degree of freedom respectively, defined by Shannon. An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous

          L
        
      
      
        2
      
    
  

{\displaystyle {\mathcal {L}}_{2}}

functions. The asymptotic equipartition property holds if the process is white, in which case the time samples are i.i.d., or there exists T > 1/2W, where W is the nominal bandwidth, such that the T-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process. Any time-invariant operations also preserves the asymptotic equipartition property, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process.