---
title: "Fisher information"
chunk: 2/8
source: "https://en.wikipedia.org/wiki/Fisher_information"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:15.726073+00:00"
instance: "kb-cron"
---

The partial derivative of f(X; θ) with respect to θ exists almost everywhere. (It can fail to exist on a null set, as long as this set does not depend on θ.)
The integral of f(X; θ) can be differentiated under the integral sign with respect to θ.
The support of f(X; θ) does not depend on θ.
If θ is a vector then the regularity conditions must hold for every component of θ. It is easy to find an example of a density that does not satisfy the regularity conditions: The density of a Uniform(0, θ) variable fails to satisfy conditions 1 and 3. In this case, even though the Fisher information can be computed from the definition, it will not have the properties it is typically assumed to have.

=== In terms of likelihood ===
Because the likelihood of θ given X is always proportional to the probability f(X; θ), their logarithms necessarily differ by a constant that is independent of θ, and the derivatives of these logarithms with respect to θ are necessarily equal.  Thus one can substitute in a log-likelihood l(θ; X) instead of log f(X; θ) in the definitions of Fisher Information.

=== Samples of any size ===
The value X can represent a single sample drawn from a single distribution or can represent a collection of samples drawn from a collection of distributions.  If there are n samples and the corresponding n distributions are statistically independent then the Fisher information will necessarily be the sum of the single-sample Fisher information values, one for each single sample from its distribution.  In particular, if the n distributions are independent and identically distributed then the Fisher information will necessarily be n times the Fisher information of a single sample from the common distribution. Stated in other words, the Fisher Information of i.i.d. observations of a sample of size n from a population is equal to the product of n and the Fisher Information of a single observation from the same population.

=== Informal derivation of the Cramér–Rao bound ===
The Cramér–Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any unbiased estimator of θ. Van Trees (1968) and Frieden (2004) provide the following method of deriving the Cramér–Rao bound, a result which describes use of the Fisher information.
Informally, we begin by considering an unbiased estimator 
  
    
      
        
          
            
              θ
              ^
            
          
        
        (
        X
        )
      
    
    {\displaystyle {\hat {\theta }}(X)}
  
. Mathematically, "unbiased" means that

  
    
      
        E
        ⁡
        
          [
          
            
              
              
                
                  
                    
                      θ
                      ^
                    
                  
                
                (
                X
                )
                −
                θ
                
                
              
              |
            
            
            
            θ
          
          ]
        
        =
        ∫
        
          (
          
            
              
                
                  θ
                  ^
                
              
            
            (
            x
            )
            −
            θ
          
          )
        
        
        f
        (
        x
        ;
        θ
        )
        
        d
        x
        =
        0
        
           regardless of the value of 
        
        θ
        .
      
    
    {\displaystyle \operatorname {E} \left[\left.{\hat {\theta }}(X)-\theta \,\,\right|\,\,\theta \right]=\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=0{\text{ regardless of the value of }}\theta .}
  

This expression is zero independent of θ, so its partial derivative with respect to θ must also be zero. By the product rule, this partial derivative is also equal to

  
    
      
        0
        =
        
          
            ∂
            
              ∂
              θ
            
          
        
        ∫
        
          (
          
            
              
                
                  θ
                  ^
                
              
            
            (
            x
            )
            −
            θ
          
          )
        
        
        f
        (
        x
        ;
        θ
        )
        
        d
        x
        =
        ∫
        
          (
          
            
              
                
                  θ
                  ^
                
              
            
            (
            x
            )
            −
            θ
          
          )
        
        
          
            
              ∂
              f
            
            
              ∂
              θ
            
          
        
        
        d
        x
        −
        ∫
        f
        
        d
        x
        .
      
    
    {\displaystyle 0={\frac {\partial }{\partial \theta }}\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=\int \left({\hat {\theta }}(x)-\theta \right){\frac {\partial f}{\partial \theta }}\,dx-\int f\,dx.}
  

For each θ, the likelihood function is a probability density function, and therefore 
  
    
      
        ∫
        f
        
        d
        x
        =
        1
      
    
    {\displaystyle \int f\,dx=1}
  
. By using the chain rule on the partial derivative of 
  
    
      
        log
        ⁡
        f
      
    
    {\displaystyle \log f}
  
 and then dividing and multiplying by 
  
    
      
        f
        (
        x
        ;
        θ
        )
      
    
    {\displaystyle f(x;\theta )}
  
, one can verify that

  
    
      
        
          
            
              ∂
              f
            
            
              ∂
              θ
            
          
        
        =
        f
        
        
          
            
              ∂
              log
              ⁡
              f
            
            
              ∂
              θ
            
          
        
        .
      
    
    {\displaystyle {\frac {\partial f}{\partial \theta }}=f\,{\frac {\partial \log f}{\partial \theta }}.}
  

Using these two facts in the above, we get

  
    
      
        ∫
        
          (
          
            
              
                
                  θ
                  ^
                
              
            
            −
            θ
          
          )
        
        f
        
        
          
            
              ∂
              log
              ⁡
              f
            
            
              ∂
              θ
            
          
        
        
        d
        x
        =
        1.
      
    
    {\displaystyle \int \left({\hat {\theta }}-\theta \right)f\,{\frac {\partial \log f}{\partial \theta }}\,dx=1.}
  

Factoring the integrand gives

  
    
      
        ∫
        
          (
          
            
              (
              
                
                  
                    
                      θ
                      ^
                    
                  
                
                −
                θ
              
              )
            
            
              
                f
              
            
          
          )
        
        
          (
          
            
              
                f
              
            
            
            
              
                
                  ∂
                  log
                  ⁡
                  f
                
                
                  ∂
                  θ
                
              
            
          
          )
        
        
        d
        x
        =
        1.
      
    
    {\displaystyle \int \left(\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right)\left({\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right)\,dx=1.}
  

Squaring the expression in the integral, the Cauchy–Schwarz inequality yields

  
    
      
        1
        =
        
          
            (
          
        
        ∫
        
          [
          
            
              (
              
                
                  
                    
                      θ
                      ^
                    
                  
                
                −
                θ
              
              )
            
            
              
                f
              
            
          
          ]
        
        ⋅
        
          [
          
            
              
                f
              
            
            
            
              
                
                  ∂
                  log
                  ⁡
                  f
                
                
                  ∂
                  θ
                
              
            
          
          ]
        
        
        d
        x
        
          
            
              )
            
          
          
            2
          
        
        ≤
        
          [
          
            ∫
            
              
                (
                
                  
                    
                      
                        θ
                        ^
                      
                    
                  
                  −
                  θ
                
                )
              
              
                2
              
            
            f
            
            d
            x
          
          ]
        
        ⋅
        
          [
          
            ∫
            
              
                (
                
                  
                    
                      ∂
                      log
                      ⁡
                      f
                    
                    
                      ∂
                      θ
                    
                  
                
                )
              
              
                2
              
            
            f
            
            d
            x
          
          ]
        
        .
      
    
    {\displaystyle 1={\biggl (}\int \left[\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right]\cdot \left[{\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right]\,dx{\biggr )}^{2}\leq \left[\int \left({\hat {\theta }}-\theta \right)^{2}f\,dx\right]\cdot \left[\int \left({\frac {\partial \log f}{\partial \theta }}\right)^{2}f\,dx\right].}
  

The second bracketed factor is defined to be the Fisher Information, while the first bracketed factor is the mean-squared error (MSE) of the estimator 
  
    
      
        
          
            
              θ
              ^
            
          
        
      
    
    {\displaystyle {\hat {\theta }}}
  
. Since the estimator is unbiased, its MSE equals its variance. By rearranging, the inequality tells us that

  
    
      
        Var
        ⁡
        (
        
          
            
              θ
              ^
            
          
        
        )
        ≥
        
          
            1
            
              
                
                  I
                
              
              
                (
                θ
                )
              
            
          
        
        .
      
    
    {\displaystyle \operatorname {Var} ({\hat {\theta }})\geq {\frac {1}{{\mathcal {I}}\left(\theta \right)}}.}