kb/Fisher_information-1.md at a246fdfd3454927233a3597be9e2ea1731ffbb48

turtle89431 712b063c02 Scrape wikipedia-science: 6045 new, 3188 updated, 9503 total (kb-cron)

2026-05-05 02:51:10 -07:00

12 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Fisher information	2/8	https://en.wikipedia.org/wiki/Fisher_information	reference	science, encyclopedia	2026-05-05T09:50:15.726073+00:00	kb-cron

The partial derivative of f(X; θ) with respect to θ exists almost everywhere. (It can fail to exist on a null set, as long as this set does not depend on θ.) The integral of f(X; θ) can be differentiated under the integral sign with respect to θ. The support of f(X; θ) does not depend on θ. If θ is a vector then the regularity conditions must hold for every component of θ. It is easy to find an example of a density that does not satisfy the regularity conditions: The density of a Uniform(0, θ) variable fails to satisfy conditions 1 and 3. In this case, even though the Fisher information can be computed from the definition, it will not have the properties it is typically assumed to have.

=== In terms of likelihood === Because the likelihood of θ given X is always proportional to the probability f(X; θ), their logarithms necessarily differ by a constant that is independent of θ, and the derivatives of these logarithms with respect to θ are necessarily equal. Thus one can substitute in a log-likelihood l(θ; X) instead of log f(X; θ) in the definitions of Fisher Information.

=== Samples of any size === The value X can represent a single sample drawn from a single distribution or can represent a collection of samples drawn from a collection of distributions. If there are n samples and the corresponding n distributions are statistically independent then the Fisher information will necessarily be the sum of the single-sample Fisher information values, one for each single sample from its distribution. In particular, if the n distributions are independent and identically distributed then the Fisher information will necessarily be n times the Fisher information of a single sample from the common distribution. Stated in other words, the Fisher Information of i.i.d. observations of a sample of size n from a population is equal to the product of n and the Fisher Information of a single observation from the same population.

=== Informal derivation of the Cramér–Rao bound === The Cramér–Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any unbiased estimator of θ. Van Trees (1968) and Frieden (2004) provide the following method of deriving the Cramér–Rao bound, a result which describes use of the Fisher information. Informally, we begin by considering an unbiased estimator

          θ
          ^
        
      
    
    (
    X
    )
  

{\displaystyle {\hat {\theta }}(X)}

. Mathematically, "unbiased" means that

    E
    ⁡
    
      [
      
        
          
          
            
              
                
                  θ
                  ^
                
              
            
            (
            X
            )
            −
            θ
            
            
          
          |
        
        
        
        θ
      
      ]
    
    =
    ∫
    
      (
      
        
          
            
              θ
              ^
            
          
        
        (
        x
        )
        −
        θ
      
      )
    
    
    f
    (
    x
    ;
    θ
    )
    
    d
    x
    =
    0
    
       regardless of the value of 
    
    θ
    .
  

{\displaystyle \operatorname {E} \left[\left.{\hat {\theta }}(X)-\theta \,\,\right|\,\,\theta \right]=\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=0{\text{ regardless of the value of }}\theta .}

This expression is zero independent of θ, so its partial derivative with respect to θ must also be zero. By the product rule, this partial derivative is also equal to

    0
    =
    
      
        ∂
        
          ∂
          θ
        
      
    
    ∫
    
      (
      
        
          
            
              θ
              ^
            
          
        
        (
        x
        )
        −
        θ
      
      )
    
    
    f
    (
    x
    ;
    θ
    )
    
    d
    x
    =
    ∫
    
      (
      
        
          
            
              θ
              ^
            
          
        
        (
        x
        )
        −
        θ
      
      )
    
    
      
        
          ∂
          f
        
        
          ∂
          θ
        
      
    
    
    d
    x
    −
    ∫
    f
    
    d
    x
    .
  

{\displaystyle 0={\frac {\partial }{\partial \theta }}\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=\int \left({\hat {\theta }}(x)-\theta \right){\frac {\partial f}{\partial \theta }}\,dx-\int f\,dx.}

For each θ, the likelihood function is a probability density function, and therefore

    ∫
    f
    
    d
    x
    =
    1
  

{\displaystyle \int f\,dx=1}

. By using the chain rule on the partial derivative of

    log
    ⁡
    f
  

{\displaystyle \log f}

and then dividing and multiplying by

    f
    (
    x
    ;
    θ
    )
  

{\displaystyle f(x;\theta )}

, one can verify that

          ∂
          f
        
        
          ∂
          θ
        
      
    
    =
    f
    
    
      
        
          ∂
          log
          ⁡
          f
        
        
          ∂
          θ
        
      
    
    .
  

{\displaystyle {\frac {\partial f}{\partial \theta }}=f\,{\frac {\partial \log f}{\partial \theta }}.}

Using these two facts in the above, we get

    ∫
    
      (
      
        
          
            
              θ
              ^
            
          
        
        −
        θ
      
      )
    
    f
    
    
      
        
          ∂
          log
          ⁡
          f
        
        
          ∂
          θ
        
      
    
    
    d
    x
    =
    1.
  

{\displaystyle \int \left({\hat {\theta }}-\theta \right)f\,{\frac {\partial \log f}{\partial \theta }}\,dx=1.}

Factoring the integrand gives

    ∫
    
      (
      
        
          (
          
            
              
                
                  θ
                  ^
                
              
            
            −
            θ
          
          )
        
        
          
            f
          
        
      
      )
    
    
      (
      
        
          
            f
          
        
        
        
          
            
              ∂
              log
              ⁡
              f
            
            
              ∂
              θ
            
          
        
      
      )
    
    
    d
    x
    =
    1.
  

{\displaystyle \int \left(\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right)\left({\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right)\,dx=1.}

Squaring the expression in the integral, the Cauchy–Schwarz inequality yields

    1
    =
    
      
        (
      
    
    ∫
    
      [
      
        
          (
          
            
              
                
                  θ
                  ^
                
              
            
            −
            θ
          
          )
        
        
          
            f
          
        
      
      ]
    
    ⋅
    
      [
      
        
          
            f
          
        
        
        
          
            
              ∂
              log
              ⁡
              f
            
            
              ∂
              θ
            
          
        
      
      ]
    
    
    d
    x
    
      
        
          )
        
      
      
        2
      
    
    ≤
    
      [
      
        ∫
        
          
            (
            
              
                
                  
                    θ
                    ^
                  
                
              
              −
              θ
            
            )
          
          
            2
          
        
        f
        
        d
        x
      
      ]
    
    ⋅
    
      [
      
        ∫
        
          
            (
            
              
                
                  ∂
                  log
                  ⁡
                  f
                
                
                  ∂
                  θ
                
              
            
            )
          
          
            2
          
        
        f
        
        d
        x
      
      ]
    
    .
  

{\displaystyle 1={\biggl (}\int \left[\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right]\cdot \left[{\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right]\,dx{\biggr )}^{2}\leq \left[\int \left({\hat {\theta }}-\theta \right)^{2}f\,dx\right]\cdot \left[\int \left({\frac {\partial \log f}{\partial \theta }}\right)^{2}f\,dx\right].}

The second bracketed factor is defined to be the Fisher Information, while the first bracketed factor is the mean-squared error (MSE) of the estimator

          θ
          ^
        
      
    
  

{\displaystyle {\hat {\theta }}}

. Since the estimator is unbiased, its MSE equals its variance. By rearranging, the inequality tells us that

    Var
    ⁡
    (
    
      
        
          θ
          ^
        
      
    
    )
    ≥
    
      
        1
        
          
            
              I
            
          
          
            (
            θ
            )
          
        
      
    
    .
  

{\displaystyle \operatorname {Var} ({\hat {\theta }})\geq {\frac {1}{{\mathcal {I}}\left(\theta \right)}}.}

12 KiB Raw Blame History Unescape Escape

12 KiB

Raw Blame History