12 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Fisher information | 6/8 | https://en.wikipedia.org/wiki/Fisher_information | reference | science, encyclopedia | 2026-05-05T09:50:15.726073+00:00 | kb-cron |
Given a convex function
f
:
[
0
,
∞
)
→
(
−
∞
,
∞
]
{\displaystyle f:[0,\infty )\to (-\infty ,\infty ]}
that
f
(
x
)
{\displaystyle f(x)}
is finite for all
x
>
0
{\displaystyle x>0}
,
f
(
1
)
=
0
{\displaystyle f(1)=0}
, and
f
(
0
)
=
lim
t
→
0
+
f
(
t
)
{\displaystyle f(0)=\lim _{t\to 0^{+}}f(t)}
, (which could be infinite), it defines an f-divergence
D
f
{\displaystyle D_{f}}
. Then if
f
{\displaystyle f}
is strictly convex at
1
{\displaystyle 1}
, then locally at
θ
∈
Θ
{\displaystyle \theta \in \Theta }
, the Fisher information matrix is a metric, in the sense that
(
δ
θ
)
T
I
(
θ
)
(
δ
θ
)
=
1
f
″
(
1
)
D
f
(
P
θ
+
δ
θ
∥
P
θ
)
{\displaystyle (\delta \theta )^{T}I(\theta )(\delta \theta )={\frac {1}{f''(1)}}D_{f}(P_{\theta +\delta \theta }\parallel P_{\theta })}
where
P
θ
{\displaystyle P_{\theta }}
is the distribution parametrized by
θ
{\displaystyle \theta }
. That is, it's the distribution with pdf
f
(
x
;
θ
)
{\displaystyle f(x;\theta )}
. In this form, it is clear that the Fisher information matrix is a Riemannian metric, and varies correctly under a change of variables. (see section on Reparameterization.)
=== Sufficient statistic === The information provided by a sufficient statistic is the same as that of the sample X. This may be seen by using Neyman's factorization criterion for a sufficient statistic. If T(X) is sufficient for θ, then
f
(
X
;
θ
)
=
g
(
T
(
X
)
,
θ
)
h
(
X
)
{\displaystyle f(X;\theta )=g(T(X),\theta )h(X)}
for some functions g and h. The independence of h(X) from θ implies
∂
∂
θ
log
[
f
(
X
;
θ
)
]
=
∂
∂
θ
log
[
g
(
T
(
X
)
;
θ
)
]
,
{\displaystyle {\frac {\partial }{\partial \theta }}\log \left[f(X;\theta )\right]={\frac {\partial }{\partial \theta }}\log \left[g(T(X);\theta )\right],}
and the equality of information then follows from the definition of Fisher information. More generally, if T = t(X) is a statistic, then
I
T
(
θ
)
≤
I
X
(
θ
)
{\displaystyle {\mathcal {I}}_{T}(\theta )\leq {\mathcal {I}}_{X}(\theta )}
with equality if and only if T is a sufficient statistic.
=== Reparameterization === The Fisher information depends on the parametrization of the problem. If θ and η are two scalar parametrizations of an estimation problem, and θ is a continuously differentiable function of η, then
I
η
(
η
)
=
I
θ
(
θ
(
η
)
)
(
d
θ
d
η
)
2
{\displaystyle {\mathcal {I}}_{\eta }(\eta )={\mathcal {I}}_{\theta }(\theta (\eta ))\left({\frac {d\theta }{d\eta }}\right)^{2}}
where
I
η
{\displaystyle {\mathcal {I}}_{\eta }}
and
I
θ
{\displaystyle {\mathcal {I}}_{\theta }}
are the Fisher information measures of η and θ, respectively. In the vector case, suppose
θ
{\displaystyle {\boldsymbol {\theta }}}
and
η
{\displaystyle {\boldsymbol {\eta }}}
are k-vectors which parametrize an estimation problem, and suppose that
θ
{\displaystyle {\boldsymbol {\theta }}}
is a continuously differentiable function of
η
{\displaystyle {\boldsymbol {\eta }}}
, then,
I
η
(
η
)
=
J
T
I
θ
(
θ
(
η
)
)
J
{\displaystyle {\mathcal {I}}_{\boldsymbol {\eta }}({\boldsymbol {\eta }})={\boldsymbol {J}}^{\textsf {T}}{\mathcal {I}}_{\boldsymbol {\theta }}({\boldsymbol {\theta }}({\boldsymbol {\eta }})){\boldsymbol {J}}}
where the (i, j)th element of the k × k Jacobian matrix
J
{\displaystyle {\boldsymbol {J}}}
is defined by
J
i
j
=
∂
θ
i
∂
η
j
,
{\displaystyle J_{ij}={\frac {\partial \theta _{i}}{\partial \eta _{j}}},}
and where
J
T
{\displaystyle {\boldsymbol {J}}^{\textsf {T}}}
is the matrix transpose of
J
.
{\displaystyle {\boldsymbol {J}}.}
In information geometry, this is seen as a change of coordinates on a Riemannian manifold, and the intrinsic properties of curvature are unchanged under different parametrizations. In general, the Fisher information matrix provides a Riemannian metric (more precisely, the Fisher–Rao metric) for the manifold of thermodynamic states, and can be used as an information-geometric complexity measure for a classification of phase transitions, e.g., the scalar curvature of the thermodynamic metric tensor diverges at (and only at) a phase transition point. In the thermodynamic context, the Fisher information matrix is directly related to the rate of change in the corresponding order parameters. In particular, such relations identify second-order phase transitions via divergences of individual elements of the Fisher information matrix.
=== Isoperimetric inequality === The Fisher information matrix plays a role in an inequality like the isoperimetric inequality. Of all probability distributions with a given entropy, the one whose Fisher information matrix has the smallest trace is the Gaussian distribution. This is like how, of all bounded sets with a given volume, the sphere has the smallest surface area. The proof involves taking a multivariate random variable
X
{\displaystyle X}
with density function
f
{\displaystyle f}
and adding a location parameter to form a family of densities
{
f
(
x
−
θ
)
∣
θ
∈
R
n
}
{\displaystyle \{f(x-\theta )\mid \theta \in \mathbb {R} ^{n}\}}
. Then, by analogy with the Minkowski–Steiner formula, the "surface area" of
X
{\displaystyle X}
is defined to be
S
(
X
)
=
lim
ε
→
0
e
H
(
X
+
Z
ε
)
−
e
H
(
X
)
ε
{\displaystyle S(X)=\lim _{\varepsilon \to 0}{\frac {e^{H(X+Z_{\varepsilon })}-e^{H(X)}}{\varepsilon }}}