30 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Fisher information | 5/8 | https://en.wikipedia.org/wiki/Fisher_information | reference | science, encyclopedia | 2026-05-05T09:50:15.726073+00:00 | kb-cron |
=== Multivariate normal distribution === The FIM for a N-variate multivariate normal distribution,
X
∼
N
(
μ
(
θ
)
,
Σ
(
θ
)
)
{\displaystyle \,X\sim N\left(\mu (\theta ),\,\Sigma (\theta )\right)}
has a special form. Let the K-dimensional vector of parameters be
θ
=
[
θ
1
…
θ
K
]
T
{\displaystyle \theta ={\begin{bmatrix}\theta _{1}&\dots &\theta _{K}\end{bmatrix}}^{\textsf {T}}}
and the vector of normal random variables be
X
=
[
X
1
…
X
N
]
T
{\displaystyle X={\begin{bmatrix}X_{1}&\dots &X_{N}\end{bmatrix}}^{\textsf {T}}}
. Assume that the mean values of these random variables are
μ
(
θ
)
=
[
μ
1
(
θ
)
…
μ
N
(
θ
)
]
T
{\displaystyle \,\mu (\theta )={\begin{bmatrix}\mu _{1}(\theta )&\dots &\mu _{N}(\theta )\end{bmatrix}}^{\textsf {T}}}
, and let
Σ
(
θ
)
{\displaystyle \,\Sigma (\theta )}
be the covariance matrix. Then, for
1
≤
m
,
n
≤
K
{\displaystyle 1\leq m,\,n\leq K}
, the (m, n) entry of the FIM is:
I
m
,
n
=
∂
μ
T
∂
θ
m
Σ
−
1
∂
μ
∂
θ
n
+
1
2
tr
(
Σ
−
1
∂
Σ
∂
θ
m
Σ
−
1
∂
Σ
∂
θ
n
)
,
{\displaystyle {\mathcal {I}}_{m,n}={\frac {\partial \mu ^{\textsf {T}}}{\partial \theta _{m}}}\Sigma ^{-1}{\frac {\partial \mu }{\partial \theta _{n}}}+{\frac {1}{2}}\operatorname {tr} \left(\Sigma ^{-1}{\frac {\partial \Sigma }{\partial \theta _{m}}}\Sigma ^{-1}{\frac {\partial \Sigma }{\partial \theta _{n}}}\right),}
where
(
⋅
)
T
{\displaystyle (\cdot )^{\textsf {T}}}
denotes the transpose of a vector,
tr
(
⋅
)
{\displaystyle \operatorname {tr} (\cdot )}
denotes the trace of a square matrix, and:
∂
μ
∂
θ
m
=
[
∂
μ
1
∂
θ
m
∂
μ
2
∂
θ
m
⋯
∂
μ
N
∂
θ
m
]
T
;
∂
Σ
∂
θ
m
=
[
∂
Σ
1
,
1
∂
θ
m
∂
Σ
1
,
2
∂
θ
m
⋯
∂
Σ
1
,
N
∂
θ
m
∂
Σ
2
,
1
∂
θ
m
∂
Σ
2
,
2
∂
θ
m
⋯
∂
Σ
2
,
N
∂
θ
m
⋮
⋮
⋱
⋮
∂
Σ
N
,
1
∂
θ
m
∂
Σ
N
,
2
∂
θ
m
⋯
∂
Σ
N
,
N
∂
θ
m
]
.
{\displaystyle {\begin{aligned}{\frac {\partial \mu }{\partial \theta _{m}}}&={\begin{bmatrix}{\dfrac {\partial \mu _{1}}{\partial \theta _{m}}}&{\dfrac {\partial \mu _{2}}{\partial \theta _{m}}}&\cdots &{\dfrac {\partial \mu _{N}}{\partial \theta _{m}}}\end{bmatrix}}^{\textsf {T}};\\[8pt]{\dfrac {\partial \Sigma }{\partial \theta _{m}}}&={\begin{bmatrix}{\dfrac {\partial \Sigma _{1,1}}{\partial \theta _{m}}}&{\dfrac {\partial \Sigma _{1,2}}{\partial \theta _{m}}}&\cdots &{\dfrac {\partial \Sigma _{1,N}}{\partial \theta _{m}}}\\[5pt]{\dfrac {\partial \Sigma _{2,1}}{\partial \theta _{m}}}&{\dfrac {\partial \Sigma _{2,2}}{\partial \theta _{m}}}&\cdots &{\dfrac {\partial \Sigma _{2,N}}{\partial \theta _{m}}}\\\vdots &\vdots &\ddots &\vdots \\{\dfrac {\partial \Sigma _{N,1}}{\partial \theta _{m}}}&{\dfrac {\partial \Sigma _{N,2}}{\partial \theta _{m}}}&\cdots &{\dfrac {\partial \Sigma _{N,N}}{\partial \theta _{m}}}\end{bmatrix}}.\end{aligned}}}
Note that a special, but very common, case is the one where
Σ
(
θ
)
=
Σ
{\displaystyle \Sigma (\theta )=\Sigma }
, a constant. Then
I
m
,
n
=
∂
μ
T
∂
θ
m
Σ
−
1
∂
μ
∂
θ
n
.
{\displaystyle {\mathcal {I}}_{m,n}={\frac {\partial \mu ^{\textsf {T}}}{\partial \theta _{m}}}\Sigma ^{-1}{\frac {\partial \mu }{\partial \theta _{n}}}.\ }
In this case the Fisher information matrix may be identified with the coefficient matrix of the normal equations of least squares estimation theory. Another special case occurs when the mean and covariance depend on two different vector parameters, say, β and θ. This is especially popular in the analysis of spatial data, which often uses a linear model with correlated residuals. In this case,
I
(
β
,
θ
)
=
diag
(
I
(
β
)
,
I
(
θ
)
)
{\displaystyle {\mathcal {I}}(\beta ,\theta )=\operatorname {diag} \left({\mathcal {I}}(\beta ),{\mathcal {I}}(\theta )\right)}
where
I
(
β
)
m
,
n
=
∂
μ
T
∂
β
m
Σ
−
1
∂
μ
∂
β
n
,
I
(
θ
)
m
,
n
=
1
2
tr
(
Σ
−
1
∂
Σ
∂
θ
m
Σ
−
1
∂
Σ
∂
θ
n
)
{\displaystyle {\begin{aligned}{\mathcal {I}}{(\beta )_{m,n}}&={\frac {\partial \mu ^{\textsf {T}}}{\partial \beta _{m}}}\Sigma ^{-1}{\frac {\partial \mu }{\partial \beta _{n}}},\\[5pt]{\mathcal {I}}{(\theta )_{m,n}}&={\frac {1}{2}}\operatorname {tr} \left(\Sigma ^{-1}{\frac {\partial \Sigma }{\partial \theta _{m}}}{\Sigma ^{-1}}{\frac {\partial \Sigma }{\partial \theta _{n}}}\right)\end{aligned}}}
== Properties ==
=== Chain rule === Similar to the entropy or mutual information, the Fisher information also possesses a chain rule decomposition. In particular, if X and Y are jointly distributed random variables, it follows that:
I
X
,
Y
(
θ
)
=
I
X
(
θ
)
+
I
Y
∣
X
(
θ
)
,
{\displaystyle {\mathcal {I}}_{X,Y}(\theta )={\mathcal {I}}_{X}(\theta )+{\mathcal {I}}_{Y\mid X}(\theta ),}
where
I
Y
∣
X
(
θ
)
=
E
X
[
I
Y
∣
X
=
x
(
θ
)
]
{\displaystyle {\mathcal {I}}_{Y\mid X}(\theta )=\operatorname {E} _{X}\left[{\mathcal {I}}_{Y\mid X=x}(\theta )\right]}
and
I
Y
∣
X
=
x
(
θ
)
{\displaystyle {\mathcal {I}}_{Y\mid X=x}(\theta )}
is the Fisher information of Y relative to
θ
{\displaystyle \theta }
calculated with respect to the conditional density of Y given a specific value X = x. As a special case, if the two random variables are independent, the information yielded by the two random variables is the sum of the information from each random variable separately:
I
X
,
Y
(
θ
)
=
I
X
(
θ
)
+
I
Y
(
θ
)
.
{\displaystyle {\mathcal {I}}_{X,Y}(\theta )={\mathcal {I}}_{X}(\theta )+{\mathcal {I}}_{Y}(\theta ).}
Consequently, the information in a random sample of n independent and identically distributed observations is n times the information in a sample of size 1.
=== f-divergence ===