kb/data/en.wikipedia.org/wiki/Fisher_information-5.md

12 KiB
Raw Blame History

title chunk source category tags date_saved instance
Fisher information 6/8 https://en.wikipedia.org/wiki/Fisher_information reference science, encyclopedia 2026-05-05T09:50:15.726073+00:00 kb-cron

Given a convex function

    f
    :
    [
    0
    ,
    ∞
    )
    →
    (
    
    ∞
    ,
    ∞
    ]
  

{\displaystyle f:[0,\infty )\to (-\infty ,\infty ]}

that

    f
    (
    x
    )
  

{\displaystyle f(x)}

is finite for all

    x
    >
    0
  

{\displaystyle x>0}

,

    f
    (
    1
    )
    =
    0
  

{\displaystyle f(1)=0}

, and

    f
    (
    0
    )
    =
    
      lim
      
        t
        →
        
          0
          
            +
          
        
      
    
    f
    (
    t
    )
  

{\displaystyle f(0)=\lim _{t\to 0^{+}}f(t)}

, (which could be infinite), it defines an f-divergence

      D
      
        f
      
    
  

{\displaystyle D_{f}}

. Then if

    f
  

{\displaystyle f}

is strictly convex at

    1
  

{\displaystyle 1}

, then locally at

    θ
    ∈
    Θ
  

{\displaystyle \theta \in \Theta }

, the Fisher information matrix is a metric, in the sense that

    (
    δ
    θ
    
      )
      
        T
      
    
    I
    (
    θ
    )
    (
    δ
    θ
    )
    =
    
      
        1
        
          
            f
            ″
          
          (
          1
          )
        
      
    
    
      D
      
        f
      
    
    (
    
      P
      
        θ
        +
        δ
        θ
      
    
    ∥
    
      P
      
        θ
      
    
    )
  

{\displaystyle (\delta \theta )^{T}I(\theta )(\delta \theta )={\frac {1}{f''(1)}}D_{f}(P_{\theta +\delta \theta }\parallel P_{\theta })}

where

      P
      
        θ
      
    
  

{\displaystyle P_{\theta }}

is the distribution parametrized by

    θ
  

{\displaystyle \theta }

. That is, it's the distribution with pdf

    f
    (
    x
    ;
    θ
    )
  

{\displaystyle f(x;\theta )}

. In this form, it is clear that the Fisher information matrix is a Riemannian metric, and varies correctly under a change of variables. (see section on Reparameterization.)

=== Sufficient statistic === The information provided by a sufficient statistic is the same as that of the sample X. This may be seen by using Neyman's factorization criterion for a sufficient statistic. If T(X) is sufficient for θ, then

    f
    (
    X
    ;
    θ
    )
    =
    g
    (
    T
    (
    X
    )
    ,
    θ
    )
    h
    (
    X
    )
  

{\displaystyle f(X;\theta )=g(T(X),\theta )h(X)}

for some functions g and h. The independence of h(X) from θ implies

        ∂
        
          ∂
          θ
        
      
    
    log
    
    
      [
      
        f
        (
        X
        ;
        θ
        )
      
      ]
    
    =
    
      
        ∂
        
          ∂
          θ
        
      
    
    log
    
    
      [
      
        g
        (
        T
        (
        X
        )
        ;
        θ
        )
      
      ]
    
    ,
  

{\displaystyle {\frac {\partial }{\partial \theta }}\log \left[f(X;\theta )\right]={\frac {\partial }{\partial \theta }}\log \left[g(T(X);\theta )\right],}

and the equality of information then follows from the definition of Fisher information. More generally, if T = t(X) is a statistic, then

          I
        
      
      
        T
      
    
    (
    θ
    )
    ≤
    
      
        
          I
        
      
      
        X
      
    
    (
    θ
    )
  

{\displaystyle {\mathcal {I}}_{T}(\theta )\leq {\mathcal {I}}_{X}(\theta )}

with equality if and only if T is a sufficient statistic.

=== Reparameterization === The Fisher information depends on the parametrization of the problem. If θ and η are two scalar parametrizations of an estimation problem, and θ is a continuously differentiable function of η, then

          I
        
      
      
        η
      
    
    (
    η
    )
    =
    
      
        
          I
        
      
      
        θ
      
    
    (
    θ
    (
    η
    )
    )
    
      
        (
        
          
            
              d
              θ
            
            
              d
              η
            
          
        
        )
      
      
        2
      
    
  

{\displaystyle {\mathcal {I}}_{\eta }(\eta )={\mathcal {I}}_{\theta }(\theta (\eta ))\left({\frac {d\theta }{d\eta }}\right)^{2}}

where

          I
        
      
      
        η
      
    
  

{\displaystyle {\mathcal {I}}_{\eta }}

and

          I
        
      
      
        θ
      
    
  

{\displaystyle {\mathcal {I}}_{\theta }}

are the Fisher information measures of η and θ, respectively. In the vector case, suppose

      θ
    
  

{\displaystyle {\boldsymbol {\theta }}}

and

      η
    
  

{\displaystyle {\boldsymbol {\eta }}}

are k-vectors which parametrize an estimation problem, and suppose that

      θ
    
  

{\displaystyle {\boldsymbol {\theta }}}

is a continuously differentiable function of

      η
    
  

{\displaystyle {\boldsymbol {\eta }}}

, then,

          I
        
      
      
        η
      
    
    (
    
      η
    
    )
    =
    
      
        J
      
      
        
          T
        
      
    
    
      
        
          I
        
      
      
        θ
      
    
    (
    
      θ
    
    (
    
      η
    
    )
    )
    
      J
    
  

{\displaystyle {\mathcal {I}}_{\boldsymbol {\eta }}({\boldsymbol {\eta }})={\boldsymbol {J}}^{\textsf {T}}{\mathcal {I}}_{\boldsymbol {\theta }}({\boldsymbol {\theta }}({\boldsymbol {\eta }})){\boldsymbol {J}}}

where the (i, j)th element of the k × k Jacobian matrix

      J
    
  

{\displaystyle {\boldsymbol {J}}}

is defined by

      J
      
        i
        j
      
    
    =
    
      
        
          ∂
          
            θ
            
              i
            
          
        
        
          ∂
          
            η
            
              j
            
          
        
      
    
    ,
  

{\displaystyle J_{ij}={\frac {\partial \theta _{i}}{\partial \eta _{j}}},}

and where

        J
      
      
        
          T
        
      
    
  

{\displaystyle {\boldsymbol {J}}^{\textsf {T}}}

is the matrix transpose of

      J
    
    .
  

{\displaystyle {\boldsymbol {J}}.}

In information geometry, this is seen as a change of coordinates on a Riemannian manifold, and the intrinsic properties of curvature are unchanged under different parametrizations. In general, the Fisher information matrix provides a Riemannian metric (more precisely, the FisherRao metric) for the manifold of thermodynamic states, and can be used as an information-geometric complexity measure for a classification of phase transitions, e.g., the scalar curvature of the thermodynamic metric tensor diverges at (and only at) a phase transition point. In the thermodynamic context, the Fisher information matrix is directly related to the rate of change in the corresponding order parameters. In particular, such relations identify second-order phase transitions via divergences of individual elements of the Fisher information matrix.

=== Isoperimetric inequality === The Fisher information matrix plays a role in an inequality like the isoperimetric inequality. Of all probability distributions with a given entropy, the one whose Fisher information matrix has the smallest trace is the Gaussian distribution. This is like how, of all bounded sets with a given volume, the sphere has the smallest surface area. The proof involves taking a multivariate random variable

    X
  

{\displaystyle X}

with density function

    f
  

{\displaystyle f}

and adding a location parameter to form a family of densities

    {
    f
    (
    x
    
    θ
    )
    
    θ
    ∈
    
      
        R
      
      
        n
      
    
    }
  

{\displaystyle \{f(x-\theta )\mid \theta \in \mathbb {R} ^{n}\}}

. Then, by analogy with the MinkowskiSteiner formula, the "surface area" of

    X
  

{\displaystyle X}

is defined to be

    S
    (
    X
    )
    =
    
      lim
      
        ε
        →
        0
      
    
    
      
        
          
            e
            
              H
              (
              X
              +
              
                Z
                
                  ε
                
              
              )
            
          
          
          
            e
            
              H
              (
              X
              )
            
          
        
        ε
      
    
  

{\displaystyle S(X)=\lim _{\varepsilon \to 0}{\frac {e^{H(X+Z_{\varepsilon })}-e^{H(X)}}{\varepsilon }}}