kb/data/en.wikipedia.org/wiki/Fisher_information-2.md

15 KiB
Raw Blame History

title chunk source category tags date_saved instance
Fisher information 3/8 https://en.wikipedia.org/wiki/Fisher_information reference science, encyclopedia 2026-05-05T09:50:15.726073+00:00 kb-cron

In other words, the precision to which we can estimate θ is fundamentally limited by the Fisher information of the likelihood function. Alternatively, the same conclusion can be obtained directly from the CauchySchwarz inequality for random variables,

      |
    
    Cov
    
    (
    A
    ,
    B
    )
    
      
        |
      
      
        2
      
    
    ≤
    Var
    
    (
    A
    )
    Var
    
    (
    B
    )
  

{\displaystyle |\operatorname {Cov} (A,B)|^{2}\leq \operatorname {Var} (A)\operatorname {Var} (B)}

, applied to the random variables

          θ
          ^
        
      
    
    (
    X
    )
  

{\displaystyle {\hat {\theta }}(X)}

and

      ∂
      
        θ
      
    
    log
    
    f
    (
    X
    ;
    θ
    )
  

{\displaystyle \partial _{\theta }\log f(X;\theta )}

, and observing that for unbiased estimators we have

    Cov
    
    [
    
      
        
          θ
          ^
        
      
    
    (
    X
    )
    ,
    
      ∂
      
        θ
      
    
    log
    
    f
    (
    X
    ;
    θ
    )
    ]
    =
    ∫
    
      
        
          θ
          ^
        
      
    
    (
    x
    )
    
    
      ∂
      
        θ
      
    
    f
    (
    x
    ;
    θ
    )
    
    d
    x
    =
    
      ∂
      
        θ
      
    
    E
    
    [
    
      
        
          θ
          ^
        
      
    
    ]
    =
    1.
  

{\displaystyle \operatorname {Cov} [{\hat {\theta }}(X),\partial _{\theta }\log f(X;\theta )]=\int {\hat {\theta }}(x)\,\partial _{\theta }f(x;\theta )\,dx=\partial _{\theta }\operatorname {E} [{\hat {\theta }}]=1.}

== Examples ==

=== Single-parameter Bernoulli experiment === A Bernoulli trial is a random variable with two possible outcomes, 0 and 1, with 1 having a probability of θ. The outcome can be thought of as determined by the toss of a biased coin, with the probability of heads (1) being θ and the probability of tails (0) being 1 θ. Let X be a Bernoulli trial of one sample from the distribution. The Fisher information contained in X may be calculated to be:

                I
              
            
            (
            θ
            )
          
          
            
            =
            
            E
            
            
              [
              
                
                  
                  
                    
                      
                        
                          ∂
                          
                            2
                          
                        
                        
                          ∂
                          
                            θ
                            
                              2
                            
                          
                        
                      
                    
                    log
                    
                    
                      (
                      
                        
                          θ
                          
                            X
                          
                        
                        (
                        1
                        
                        θ
                        
                          )
                          
                            1
                            
                            X
                          
                        
                      
                      )
                    
                  
                  |
                
                θ
              
              ]
            
          
        
        
          
          
            
            =
            
            E
            
            
              [
              
                
                  
                  
                    
                      
                        
                          ∂
                          
                            2
                          
                        
                        
                          ∂
                          
                            θ
                            
                              2
                            
                          
                        
                      
                    
                    
                      (
                      
                        X
                        log
                        
                        θ
                        +
                        (
                        1
                        
                        X
                        )
                        log
                        
                        (
                        1
                        
                        θ
                        )
                      
                      )
                    
                    
                    
                  
                  |
                
                
                
                θ
              
              ]
            
          
        
        
          
          
            
            =
            E
            
            
              [
              
                
                  
                  
                    
                      
                        X
                        
                          θ
                          
                            2
                          
                        
                      
                    
                    +
                    
                      
                        
                          1
                          
                          X
                        
                        
                          (
                          1
                          
                          θ
                          
                            )
                            
                              2
                            
                          
                        
                      
                    
                    
                    
                  
                  |
                
                
                
                θ
              
              ]
            
          
        
        
          
          
            
            =
            
              
                θ
                
                  θ
                  
                    2
                  
                
              
            
            +
            
              
                
                  1
                  
                  θ
                
                
                  (
                  1
                  
                  θ
                  
                    )
                    
                      2
                    
                  
                
              
            
          
        
        
          
          
            
            =
            
              
                1
                
                  θ
                  (
                  1
                  
                  θ
                  )
                
              
            
            .
          
        
      
    
  

{\displaystyle {\begin{aligned}{\mathcal {I}}(\theta )&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log \left(\theta ^{X}(1-\theta )^{1-X}\right)\right|\theta \right]\\[5pt]&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left(X\log \theta +(1-X)\log(1-\theta )\right)\,\,\right|\,\,\theta \right]\\[5pt]&=\operatorname {E} \left[\left.{\frac {X}{\theta ^{2}}}+{\frac {1-X}{(1-\theta )^{2}}}\,\,\right|\,\,\theta \right]\\[5pt]&={\frac {\theta }{\theta ^{2}}}+{\frac {1-\theta }{(1-\theta )^{2}}}\\[5pt]&={\frac {1}{\theta (1-\theta )}}.\end{aligned}}}

Because Fisher information is additive, the Fisher information contained in n independent Bernoulli trials is therefore

        I
      
    
    (
    θ
    )
    =
    
      
        n
        
          θ
          (
          1
          
          θ
          )
        
      
    
    .
  

{\displaystyle {\mathcal {I}}(\theta )={\frac {n}{\theta (1-\theta )}}.}

If

      x
      
        i
      
    
  

{\displaystyle x_{i}}

is one of the

      2
      
        n
      
    
  

{\displaystyle 2^{n}}

possible outcomes of n independent Bernoulli trials and

      x
      
        i
        j
      
    
  

{\displaystyle x_{ij}}

is the j th outcome of the i th trial, then the probability of

      x
      
        i
      
    
  

{\displaystyle x_{i}}

is given by

    p
    (
    
      x
      
        i
      
    
    ,
    θ
    )
    =
    
      ∏
      
        j
        =
        0
      
      
        n
      
    
    
      θ
      
        
          x
          
            i
            j
          
        
      
    
    (
    1
    
    θ
    
      )
      
        
          x
          
            i
            j
          
        
      
    
  

{\displaystyle p(x_{i},\theta )=\prod _{j=0}^{n}\theta ^{x_{ij}}(1-\theta )^{x_{ij}}}

The sample mean of the i th trial is

      μ
      
        i
      
    
    =
    (
    1
    
      /
    
    n
    )
    
      ∑
      
        j
        =
        1
      
      
        n
      
    
    
      x
      
        i
        j
      
    
  

{\displaystyle \mu _{i}=(1/n)\sum _{j=1}^{n}x_{ij}}

. The expected value of the sample mean (over the sampling distribution) is

    E
    (
    μ
    )
    =
    
      ∑
      
        
          x
          
            i
          
        
      
    
    
      μ
      
        i
      
    
    
    p
    (
    
      x
      
        i
      
    
    ,
    θ
    )
    =
    θ
    ,
  

{\displaystyle E(\mu )=\sum _{x_{i}}\mu _{i}\,p(x_{i},\theta )=\theta ,}

where the sum is over all

      2
      
        n
      
    
  

{\displaystyle 2^{n}}

possible trial outcomes. The expected value of the square of the sample mean is

    E
    (
    
      μ
      
        2
      
    
    )
    =
    
      ∑
      
        
          x
          
            i
          
        
      
    
    
      μ
      
        i
      
      
        2
      
    
    
    p
    (
    
      x
      
        i
      
    
    ,
    θ
    )
    =
    
      
        
          (
          1
          +
          (
          n
          
          1
          )
          θ
          )
          θ
        
        n
      
    
  

{\displaystyle E(\mu ^{2})=\sum _{x_{i}}\mu _{i}^{2}\,p(x_{i},\theta )={\frac {(1+(n-1)\theta )\theta }{n}}}

so the variance in the value of the mean is

    E
    (
    
      μ
      
        2
      
    
    )
    
    E
    (
    μ
    
      )
      
        2
      
    
    =
    
      
        
          θ
          (
          1
          
          θ
          )
        
        n
      
    
  

{\displaystyle E(\mu ^{2})-E(\mu )^{2}={\frac {\theta (1-\theta )}{n}}}

It is seen that the Fisher information is the reciprocal of the variance of the mean number of successes in n Bernoulli trials. This is generally true. In this case, the CramérRao bound is an equality.