--- title: "Fisher information" chunk: 6/8 source: "https://en.wikipedia.org/wiki/Fisher_information" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T09:50:15.726073+00:00" instance: "kb-cron" --- Given a convex function f : [ 0 , ∞ ) → ( − ∞ , ∞ ] {\displaystyle f:[0,\infty )\to (-\infty ,\infty ]} that f ( x ) {\displaystyle f(x)} is finite for all x > 0 {\displaystyle x>0} , f ( 1 ) = 0 {\displaystyle f(1)=0} , and f ( 0 ) = lim t → 0 + f ( t ) {\displaystyle f(0)=\lim _{t\to 0^{+}}f(t)} , (which could be infinite), it defines an f-divergence D f {\displaystyle D_{f}} . Then if f {\displaystyle f} is strictly convex at 1 {\displaystyle 1} , then locally at θ ∈ Θ {\displaystyle \theta \in \Theta } , the Fisher information matrix is a metric, in the sense that ( δ θ ) T I ( θ ) ( δ θ ) = 1 f ″ ( 1 ) D f ( P θ + δ θ ∥ P θ ) {\displaystyle (\delta \theta )^{T}I(\theta )(\delta \theta )={\frac {1}{f''(1)}}D_{f}(P_{\theta +\delta \theta }\parallel P_{\theta })} where P θ {\displaystyle P_{\theta }} is the distribution parametrized by θ {\displaystyle \theta } . That is, it's the distribution with pdf f ( x ; θ ) {\displaystyle f(x;\theta )} . In this form, it is clear that the Fisher information matrix is a Riemannian metric, and varies correctly under a change of variables. (see section on Reparameterization.) === Sufficient statistic === The information provided by a sufficient statistic is the same as that of the sample X. This may be seen by using Neyman's factorization criterion for a sufficient statistic. If T(X) is sufficient for θ, then f ( X ; θ ) = g ( T ( X ) , θ ) h ( X ) {\displaystyle f(X;\theta )=g(T(X),\theta )h(X)} for some functions g and h. The independence of h(X) from θ implies ∂ ∂ θ log ⁡ [ f ( X ; θ ) ] = ∂ ∂ θ log ⁡ [ g ( T ( X ) ; θ ) ] , {\displaystyle {\frac {\partial }{\partial \theta }}\log \left[f(X;\theta )\right]={\frac {\partial }{\partial \theta }}\log \left[g(T(X);\theta )\right],} and the equality of information then follows from the definition of Fisher information. More generally, if T = t(X) is a statistic, then I T ( θ ) ≤ I X ( θ ) {\displaystyle {\mathcal {I}}_{T}(\theta )\leq {\mathcal {I}}_{X}(\theta )} with equality if and only if T is a sufficient statistic. === Reparameterization === The Fisher information depends on the parametrization of the problem. If θ and η are two scalar parametrizations of an estimation problem, and θ is a continuously differentiable function of η, then I η ( η ) = I θ ( θ ( η ) ) ( d θ d η ) 2 {\displaystyle {\mathcal {I}}_{\eta }(\eta )={\mathcal {I}}_{\theta }(\theta (\eta ))\left({\frac {d\theta }{d\eta }}\right)^{2}} where I η {\displaystyle {\mathcal {I}}_{\eta }} and I θ {\displaystyle {\mathcal {I}}_{\theta }} are the Fisher information measures of η and θ, respectively. In the vector case, suppose θ {\displaystyle {\boldsymbol {\theta }}} and η {\displaystyle {\boldsymbol {\eta }}} are k-vectors which parametrize an estimation problem, and suppose that θ {\displaystyle {\boldsymbol {\theta }}} is a continuously differentiable function of η {\displaystyle {\boldsymbol {\eta }}} , then, I η ( η ) = J T I θ ( θ ( η ) ) J {\displaystyle {\mathcal {I}}_{\boldsymbol {\eta }}({\boldsymbol {\eta }})={\boldsymbol {J}}^{\textsf {T}}{\mathcal {I}}_{\boldsymbol {\theta }}({\boldsymbol {\theta }}({\boldsymbol {\eta }})){\boldsymbol {J}}} where the (i, j)th element of the k × k Jacobian matrix J {\displaystyle {\boldsymbol {J}}} is defined by J i j = ∂ θ i ∂ η j , {\displaystyle J_{ij}={\frac {\partial \theta _{i}}{\partial \eta _{j}}},} and where J T {\displaystyle {\boldsymbol {J}}^{\textsf {T}}} is the matrix transpose of J . {\displaystyle {\boldsymbol {J}}.} In information geometry, this is seen as a change of coordinates on a Riemannian manifold, and the intrinsic properties of curvature are unchanged under different parametrizations. In general, the Fisher information matrix provides a Riemannian metric (more precisely, the Fisher–Rao metric) for the manifold of thermodynamic states, and can be used as an information-geometric complexity measure for a classification of phase transitions, e.g., the scalar curvature of the thermodynamic metric tensor diverges at (and only at) a phase transition point. In the thermodynamic context, the Fisher information matrix is directly related to the rate of change in the corresponding order parameters. In particular, such relations identify second-order phase transitions via divergences of individual elements of the Fisher information matrix. === Isoperimetric inequality === The Fisher information matrix plays a role in an inequality like the isoperimetric inequality. Of all probability distributions with a given entropy, the one whose Fisher information matrix has the smallest trace is the Gaussian distribution. This is like how, of all bounded sets with a given volume, the sphere has the smallest surface area. The proof involves taking a multivariate random variable X {\displaystyle X} with density function f {\displaystyle f} and adding a location parameter to form a family of densities { f ( x − θ ) ∣ θ ∈ R n } {\displaystyle \{f(x-\theta )\mid \theta \in \mathbb {R} ^{n}\}} . Then, by analogy with the Minkowski–Steiner formula, the "surface area" of X {\displaystyle X} is defined to be S ( X ) = lim ε → 0 e H ( X + Z ε ) − e H ( X ) ε {\displaystyle S(X)=\lim _{\varepsilon \to 0}{\frac {e^{H(X+Z_{\varepsilon })}-e^{H(X)}}{\varepsilon }}}