kb/data/en.wikipedia.org/wiki/Channel_capacity-2.md

16 KiB
Raw Blame History

title chunk source category tags date_saved instance
Channel capacity 3/3 https://en.wikipedia.org/wiki/Channel_capacity reference science, encyclopedia 2026-05-05T14:40:07.588658+00:00 kb-cron

== Feedback Capacity == Feedback capacity is the greatest rate at which information can be reliably transmitted, per unit time, over a point-to-point communication channel in which the receiver feeds back the channel outputs to the transmitter. Information-theoretic analysis of communication systems that incorporate feedback is more complicated and challenging than without feedback. Possibly, this was the reason C.E. Shannon chose feedback as the subject of the first Shannon Lecture, delivered at the 1973 IEEE International Symposium on Information Theory in Ashkelon, Israel. The feedback capacity is characterized by the maximum of the directed information between the channel inputs and the channel outputs, where the maximization is with respect to the causal conditioning of the input given the output. The directed information was coined by James Massey in 1990, who showed that its an upper bound on feedback capacity. For memoryless channels, Shannon showed that feedback does not increase the capacity, and the feedback capacity coincides with the channel capacity characterized by the mutual information between the input and the output. The feedback capacity is known as a closed-form expression only for several examples such as the trapdoor channel, Ising channel,. For some other channels, it is characterized through constant-size optimization problems such as the binary erasure channel with a no-consecutive-ones input constraint, NOST channel. The basic mathematical model for a communication system is the following:

Here is the formal definition of each element (where the only difference with respect to the nonfeedback capacity is the encoder definition):

    W
  

{\displaystyle W}

is the message to be transmitted, taken in an alphabet

        W
      
    
  

{\displaystyle {\mathcal {W}}}

;

    X
  

{\displaystyle X}

is the channel input symbol (

      X
      
        n
      
    
  

{\displaystyle X^{n}}

is a sequence of

    n
  

{\displaystyle n}

symbols) taken in an alphabet

        X
      
    
  

{\displaystyle {\mathcal {X}}}

;

    Y
  

{\displaystyle Y}

is the channel output symbol (

      Y
      
        n
      
    
  

{\displaystyle Y^{n}}

is a sequence of

    n
  

{\displaystyle n}

symbols) taken in an alphabet

        Y
      
    
  

{\displaystyle {\mathcal {Y}}}

;

          W
          ^
        
      
    
  

{\displaystyle {\hat {W}}}

is the estimate of the transmitted message;

      f
      
        i
      
    
    :
    
      
        W
      
    
    ×
    
      
        
          Y
        
      
      
        i
        
        1
      
    
    →
    
      
        X
      
    
  

{\displaystyle f_{i}:{\mathcal {W}}\times {\mathcal {Y}}^{i-1}\to {\mathcal {X}}}

is the encoding function at time

    i
  

{\displaystyle i}

, for a block of length

    n
  

{\displaystyle n}

;

    p
    (
    
      y
      
        i
      
    
    
      |
    
    
      x
      
        i
      
    
    ,
    
      y
      
        i
        
        1
      
    
    )
    =
    
      p
      
        
          Y
          
            i
          
        
        
          |
        
        
          X
          
            i
          
        
        ,
        
          Y
          
            i
            
            1
          
        
      
    
    (
    
      y
      
        i
      
    
    
      |
    
    
      x
      
        i
      
    
    ,
    
      y
      
        i
        
        1
      
    
    )
  

{\displaystyle p(y_{i}|x^{i},y^{i-1})=p_{Y_{i}|X^{i},Y^{i-1}}(y_{i}|x^{i},y^{i-1})}

is the noisy channel at time

    i
  

{\displaystyle i}

, which is modeled by a conditional probability distribution; and,

          w
          ^
        
      
    
    :
    
      
        
          Y
        
      
      
        n
      
    
    →
    
      
        W
      
    
  

{\displaystyle {\hat {w}}:{\mathcal {Y}}^{n}\to {\mathcal {W}}}

is the decoding function for a block of length

    n
  

{\displaystyle n}

. That is, for each time

    i
  

{\displaystyle i}

there exists a feedback of the previous output

      Y
      
        i
        
        1
      
    
  

{\displaystyle Y_{i-1}}

such that the encoder has access to all previous outputs

      Y
      
        i
        
        1
      
    
  

{\displaystyle Y^{i-1}}

. An

    (
    
      2
      
        n
        R
      
    
    ,
    n
    )
  

{\displaystyle (2^{nR},n)}

code is a pair of encoding and decoding mappings with

        W
      
    
    =
    [
    1
    ,
    2
    ,
    …
    ,
    
      2
      
        n
        R
      
    
    ]
  

{\displaystyle {\mathcal {W}}=[1,2,\dots ,2^{nR}]}

, and

    W
  

{\displaystyle W}

is uniformly distributed. A rate

    R
  

{\displaystyle R}

is said to be achievable if there exists a sequence of codes

    (
    
      2
      
        n
        R
      
    
    ,
    n
    )
  

{\displaystyle (2^{nR},n)}

such that the average probability of error:

      P
      
        e
      
      
        (
        n
        )
      
    
    ≜
    Pr
    (
    
      
        
          W
          ^
        
      
    
    ≠
    W
    )
  

{\displaystyle P_{e}^{(n)}\triangleq \Pr({\hat {W}}\neq W)}

tends to zero as

    n
    →
    ∞
  

{\displaystyle n\to \infty }

. The feedback capacity is denoted by

      C
      
        feedback
      
    
  

{\displaystyle C_{\text{feedback}}}

, and is defined as the supremum over all achievable rates.

=== Main results on feedback capacity === Let

    X
  

{\displaystyle X}

and

    Y
  

{\displaystyle Y}

be modeled as random variables. The causal conditioning

    P
    (
    
      y
      
        n
      
    
    
      |
    
    
      |
    
    
      x
      
        n
      
    
    )
    ≜
    
      ∏
      
        i
        =
        1
      
      
        n
      
    
    P
    (
    
      y
      
        i
      
    
    
      |
    
    
      y
      
        i
        
        1
      
    
    ,
    
      x
      
        i
      
    
    )
  

{\displaystyle P(y^{n}||x^{n})\triangleq \prod _{i=1}^{n}P(y_{i}|y^{i-1},x^{i})}

describes the given channel. The choice of the causally conditional distribution

    P
    (
    
      x
      
        n
      
    
    
      |
    
    
      |
    
    
      y
      
        n
        
        1
      
    
    )
    ≜
    
      ∏
      
        i
        =
        1
      
      
        n
      
    
    P
    (
    
      x
      
        i
      
    
    
      |
    
    
      x
      
        i
        
        1
      
    
    ,
    
      y
      
        i
        
        1
      
    
    )
  

{\displaystyle P(x^{n}||y^{n-1})\triangleq \prod _{i=1}^{n}P(x_{i}|x^{i-1},y^{i-1})}

determines the joint distribution

      p
      
        
          X
          
            n
          
        
        ,
        
          Y
          
            n
          
        
      
    
    (
    
      x
      
        n
      
    
    ,
    
      y
      
        n
      
    
    )
  

{\displaystyle p_{X^{n},Y^{n}}(x^{n},y^{n})}

due to the chain rule for causal conditioning

    P
    (
    
      y
      
        n
      
    
    ,
    
      x
      
        n
      
    
    )
    =
    P
    (
    
      y
      
        n
      
    
    
      |
    
    
      |
    
    
      x
      
        n
      
    
    )
    P
    (
    
      x
      
        n
      
    
    
      |
    
    
      |
    
    
      y
      
        n
        
        1
      
    
    )
  

{\displaystyle P(y^{n},x^{n})=P(y^{n}||x^{n})P(x^{n}||y^{n-1})}

which, in turn, induces a directed information

    I
    (
    
      X
      
        N
      
    
    →
    
      Y
      
        N
      
    
    )
    =
    
      E
    
    
      [
      
        log
        
        
          
            
              P
              (
              
                Y
                
                  N
                
              
              
                |
              
              
                |
              
              
                X
                
                  N
                
              
              )
            
            
              P
              (
              
                Y
                
                  N
                
              
              )
            
          
        
      
      ]
    
  

{\displaystyle I(X^{N}\rightarrow Y^{N})=\mathbf {E} \left[\log {\frac {P(Y^{N}||X^{N})}{P(Y^{N})}}\right]}

. The feedback capacity is given by

      C
      
        feedback
      
    
    =
    
      lim
      
        n
        →
        ∞
      
    
    
      
        1
        n
      
    
    
      sup
      
        
          P
          
            
              X
              
                n
              
            
            
              |
            
            
              |
            
            
              Y
              
                n
                
                1
              
            
          
        
      
    
    I
    (
    
      X
      
        n
      
    
    →
    
      Y
      
        n
      
    
    )
    
  

{\displaystyle \ C_{\text{feedback}}=\lim _{n\to \infty }{\frac {1}{n}}\sup _{P_{X^{n}||Y^{n-1}}}I(X^{n}\to Y^{n})\,}

, where the supremum is taken over all possible choices of

      P
      
        
          X
          
            n
          
        
        
          |
        
        
          |
        
        
          Y
          
            n
            
            1
          
        
      
    
    (
    
      x
      
        n
      
    
    
      |
    
    
      |
    
    
      y
      
        n
        
        1
      
    
    )
  

{\displaystyle P_{X^{n}||Y^{n-1}}(x^{n}||y^{n-1})}

.

=== Gaussian feedback capacity === When the Gaussian noise is colored, the channel has memory. Consider for instance the simple case on an autoregressive model noise process

      z
      
        i
      
    
    =
    
      z
      
        i
        
        1
      
    
    +
    
      w
      
        i
      
    
  

{\displaystyle z_{i}=z_{i-1}+w_{i}}

where

      w
      
        i
      
    
    
    N
    (
    0
    ,
    1
    )
  

{\displaystyle w_{i}\sim N(0,1)}

is an i.i.d. process.

=== Solution techniques === The feedback capacity is difficult to solve in the general case. There are some techniques that are related to control theory and Markov decision processes if the channel is discrete.

== See also == Bandwidth (computing) Bandwidth (signal processing) Bit rate Code rate Error exponent Nyquist rate Negentropy Redundancy Sender, Data compression, Receiver ShannonHartley theorem Spectral efficiency Throughput Shannon capacity of a graph

=== Advanced Communication Topics === MIMO Cooperative diversity

== External links == "Transmission rate of a channel", Encyclopedia of Mathematics, EMS Press, 2001 [1994] AWGN Channel Capacity with various constraints on the channel input (interactive demonstration)

== References ==