---
title: "Fisher information"
chunk: 3/8
source: "https://en.wikipedia.org/wiki/Fisher_information"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:15.726073+00:00"
instance: "kb-cron"
---

In other words, the precision to which we can estimate θ is fundamentally limited by the Fisher information of the likelihood function.
Alternatively, the same conclusion can be obtained directly from the Cauchy–Schwarz inequality for random variables, 
  
    
      
        
          |
        
        Cov
        ⁡
        (
        A
        ,
        B
        )
        
          
            |
          
          
            2
          
        
        ≤
        Var
        ⁡
        (
        A
        )
        Var
        ⁡
        (
        B
        )
      
    
    {\displaystyle |\operatorname {Cov} (A,B)|^{2}\leq \operatorname {Var} (A)\operatorname {Var} (B)}
  
, applied to the random variables 
  
    
      
        
          
            
              θ
              ^
            
          
        
        (
        X
        )
      
    
    {\displaystyle {\hat {\theta }}(X)}
  
 and 
  
    
      
        
          ∂
          
            θ
          
        
        log
        ⁡
        f
        (
        X
        ;
        θ
        )
      
    
    {\displaystyle \partial _{\theta }\log f(X;\theta )}
  
, and observing that for unbiased estimators we have
  
    
      
        Cov
        ⁡
        [
        
          
            
              θ
              ^
            
          
        
        (
        X
        )
        ,
        
          ∂
          
            θ
          
        
        log
        ⁡
        f
        (
        X
        ;
        θ
        )
        ]
        =
        ∫
        
          
            
              θ
              ^
            
          
        
        (
        x
        )
        
        
          ∂
          
            θ
          
        
        f
        (
        x
        ;
        θ
        )
        
        d
        x
        =
        
          ∂
          
            θ
          
        
        E
        ⁡
        [
        
          
            
              θ
              ^
            
          
        
        ]
        =
        1.
      
    
    {\displaystyle \operatorname {Cov} [{\hat {\theta }}(X),\partial _{\theta }\log f(X;\theta )]=\int {\hat {\theta }}(x)\,\partial _{\theta }f(x;\theta )\,dx=\partial _{\theta }\operatorname {E} [{\hat {\theta }}]=1.}
  

== Examples ==

=== Single-parameter Bernoulli experiment ===
A Bernoulli trial is a random variable with two possible outcomes, 0 and 1, with 1 having a probability of θ. The outcome can be thought of as determined by the toss of a biased coin, with the probability of heads (1) being θ and the probability of tails (0) being 1 − θ.
Let X be a Bernoulli trial of one sample from the distribution. The Fisher information contained in X may be calculated to be:

  
    
      
        
          
            
              
                
                  
                    I
                  
                
                (
                θ
                )
              
              
                
                =
                −
                E
                ⁡
                
                  [
                  
                    
                      
                      
                        
                          
                            
                              ∂
                              
                                2
                              
                            
                            
                              ∂
                              
                                θ
                                
                                  2
                                
                              
                            
                          
                        
                        log
                        ⁡
                        
                          (
                          
                            
                              θ
                              
                                X
                              
                            
                            (
                            1
                            −
                            θ
                            
                              )
                              
                                1
                                −
                                X
                              
                            
                          
                          )
                        
                      
                      |
                    
                    θ
                  
                  ]
                
              
            
            
              
              
                
                =
                −
                E
                ⁡
                
                  [
                  
                    
                      
                      
                        
                          
                            
                              ∂
                              
                                2
                              
                            
                            
                              ∂
                              
                                θ
                                
                                  2
                                
                              
                            
                          
                        
                        
                          (
                          
                            X
                            log
                            ⁡
                            θ
                            +
                            (
                            1
                            −
                            X
                            )
                            log
                            ⁡
                            (
                            1
                            −
                            θ
                            )
                          
                          )
                        
                        
                        
                      
                      |
                    
                    
                    
                    θ
                  
                  ]
                
              
            
            
              
              
                
                =
                E
                ⁡
                
                  [
                  
                    
                      
                      
                        
                          
                            X
                            
                              θ
                              
                                2
                              
                            
                          
                        
                        +
                        
                          
                            
                              1
                              −
                              X
                            
                            
                              (
                              1
                              −
                              θ
                              
                                )
                                
                                  2
                                
                              
                            
                          
                        
                        
                        
                      
                      |
                    
                    
                    
                    θ
                  
                  ]
                
              
            
            
              
              
                
                =
                
                  
                    θ
                    
                      θ
                      
                        2
                      
                    
                  
                
                +
                
                  
                    
                      1
                      −
                      θ
                    
                    
                      (
                      1
                      −
                      θ
                      
                        )
                        
                          2
                        
                      
                    
                  
                
              
            
            
              
              
                
                =
                
                  
                    1
                    
                      θ
                      (
                      1
                      −
                      θ
                      )
                    
                  
                
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}{\mathcal {I}}(\theta )&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log \left(\theta ^{X}(1-\theta )^{1-X}\right)\right|\theta \right]\\[5pt]&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left(X\log \theta +(1-X)\log(1-\theta )\right)\,\,\right|\,\,\theta \right]\\[5pt]&=\operatorname {E} \left[\left.{\frac {X}{\theta ^{2}}}+{\frac {1-X}{(1-\theta )^{2}}}\,\,\right|\,\,\theta \right]\\[5pt]&={\frac {\theta }{\theta ^{2}}}+{\frac {1-\theta }{(1-\theta )^{2}}}\\[5pt]&={\frac {1}{\theta (1-\theta )}}.\end{aligned}}}
  

Because Fisher information is additive, the Fisher information contained in n independent Bernoulli trials is therefore

  
    
      
        
          
            I
          
        
        (
        θ
        )
        =
        
          
            n
            
              θ
              (
              1
              −
              θ
              )
            
          
        
        .
      
    
    {\displaystyle {\mathcal {I}}(\theta )={\frac {n}{\theta (1-\theta )}}.}
  

If 
  
    
      
        
          x
          
            i
          
        
      
    
    {\displaystyle x_{i}}
  
 is one of the 
  
    
      
        
          2
          
            n
          
        
      
    
    {\displaystyle 2^{n}}
  
 possible outcomes of n independent Bernoulli trials and 
  
    
      
        
          x
          
            i
            j
          
        
      
    
    {\displaystyle x_{ij}}
  
 is the j th outcome of the i th trial, then the probability of 
  
    
      
        
          x
          
            i
          
        
      
    
    {\displaystyle x_{i}}
  
 is given by

  
    
      
        p
        (
        
          x
          
            i
          
        
        ,
        θ
        )
        =
        
          ∏
          
            j
            =
            0
          
          
            n
          
        
        
          θ
          
            
              x
              
                i
                j
              
            
          
        
        (
        1
        −
        θ
        
          )
          
            
              x
              
                i
                j
              
            
          
        
      
    
    {\displaystyle p(x_{i},\theta )=\prod _{j=0}^{n}\theta ^{x_{ij}}(1-\theta )^{x_{ij}}}
  

The sample mean of the i th trial is 
  
    
      
        
          μ
          
            i
          
        
        =
        (
        1
        
          /
        
        n
        )
        
          ∑
          
            j
            =
            1
          
          
            n
          
        
        
          x
          
            i
            j
          
        
      
    
    {\displaystyle \mu _{i}=(1/n)\sum _{j=1}^{n}x_{ij}}
  
. The expected value of the sample mean (over the sampling distribution) is

  
    
      
        E
        (
        μ
        )
        =
        
          ∑
          
            
              x
              
                i
              
            
          
        
        
          μ
          
            i
          
        
        
        p
        (
        
          x
          
            i
          
        
        ,
        θ
        )
        =
        θ
        ,
      
    
    {\displaystyle E(\mu )=\sum _{x_{i}}\mu _{i}\,p(x_{i},\theta )=\theta ,}
  

where the sum is over all 
  
    
      
        
          2
          
            n
          
        
      
    
    {\displaystyle 2^{n}}
  
 possible trial outcomes. The expected value of the square of the sample mean is

  
    
      
        E
        (
        
          μ
          
            2
          
        
        )
        =
        
          ∑
          
            
              x
              
                i
              
            
          
        
        
          μ
          
            i
          
          
            2
          
        
        
        p
        (
        
          x
          
            i
          
        
        ,
        θ
        )
        =
        
          
            
              (
              1
              +
              (
              n
              −
              1
              )
              θ
              )
              θ
            
            n
          
        
      
    
    {\displaystyle E(\mu ^{2})=\sum _{x_{i}}\mu _{i}^{2}\,p(x_{i},\theta )={\frac {(1+(n-1)\theta )\theta }{n}}}
  

so the variance in the value of the mean is

  
    
      
        E
        (
        
          μ
          
            2
          
        
        )
        −
        E
        (
        μ
        
          )
          
            2
          
        
        =
        
          
            
              θ
              (
              1
              −
              θ
              )
            
            n
          
        
      
    
    {\displaystyle E(\mu ^{2})-E(\mu )^{2}={\frac {\theta (1-\theta )}{n}}}
  

It is seen that the Fisher information is the reciprocal of the variance of the mean number of successes in n Bernoulli trials. This is generally true.  In this case, the Cramér–Rao bound is an equality.