12 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Fisher information | 2/8 | https://en.wikipedia.org/wiki/Fisher_information | reference | science, encyclopedia | 2026-05-05T09:50:15.726073+00:00 | kb-cron |
The partial derivative of f(X; θ) with respect to θ exists almost everywhere. (It can fail to exist on a null set, as long as this set does not depend on θ.) The integral of f(X; θ) can be differentiated under the integral sign with respect to θ. The support of f(X; θ) does not depend on θ. If θ is a vector then the regularity conditions must hold for every component of θ. It is easy to find an example of a density that does not satisfy the regularity conditions: The density of a Uniform(0, θ) variable fails to satisfy conditions 1 and 3. In this case, even though the Fisher information can be computed from the definition, it will not have the properties it is typically assumed to have.
=== In terms of likelihood === Because the likelihood of θ given X is always proportional to the probability f(X; θ), their logarithms necessarily differ by a constant that is independent of θ, and the derivatives of these logarithms with respect to θ are necessarily equal. Thus one can substitute in a log-likelihood l(θ; X) instead of log f(X; θ) in the definitions of Fisher Information.
=== Samples of any size === The value X can represent a single sample drawn from a single distribution or can represent a collection of samples drawn from a collection of distributions. If there are n samples and the corresponding n distributions are statistically independent then the Fisher information will necessarily be the sum of the single-sample Fisher information values, one for each single sample from its distribution. In particular, if the n distributions are independent and identically distributed then the Fisher information will necessarily be n times the Fisher information of a single sample from the common distribution. Stated in other words, the Fisher Information of i.i.d. observations of a sample of size n from a population is equal to the product of n and the Fisher Information of a single observation from the same population.
=== Informal derivation of the Cramér–Rao bound === The Cramér–Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any unbiased estimator of θ. Van Trees (1968) and Frieden (2004) provide the following method of deriving the Cramér–Rao bound, a result which describes use of the Fisher information. Informally, we begin by considering an unbiased estimator
θ
^
(
X
)
{\displaystyle {\hat {\theta }}(X)}
. Mathematically, "unbiased" means that
E
[
θ
^
(
X
)
−
θ
|
θ
]
=
∫
(
θ
^
(
x
)
−
θ
)
f
(
x
;
θ
)
d
x
=
0
regardless of the value of
θ
.
{\displaystyle \operatorname {E} \left[\left.{\hat {\theta }}(X)-\theta \,\,\right|\,\,\theta \right]=\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=0{\text{ regardless of the value of }}\theta .}
This expression is zero independent of θ, so its partial derivative with respect to θ must also be zero. By the product rule, this partial derivative is also equal to
0
=
∂
∂
θ
∫
(
θ
^
(
x
)
−
θ
)
f
(
x
;
θ
)
d
x
=
∫
(
θ
^
(
x
)
−
θ
)
∂
f
∂
θ
d
x
−
∫
f
d
x
.
{\displaystyle 0={\frac {\partial }{\partial \theta }}\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=\int \left({\hat {\theta }}(x)-\theta \right){\frac {\partial f}{\partial \theta }}\,dx-\int f\,dx.}
For each θ, the likelihood function is a probability density function, and therefore
∫
f
d
x
=
1
{\displaystyle \int f\,dx=1}
. By using the chain rule on the partial derivative of
log
f
{\displaystyle \log f}
and then dividing and multiplying by
f
(
x
;
θ
)
{\displaystyle f(x;\theta )}
, one can verify that
∂
f
∂
θ
=
f
∂
log
f
∂
θ
.
{\displaystyle {\frac {\partial f}{\partial \theta }}=f\,{\frac {\partial \log f}{\partial \theta }}.}
Using these two facts in the above, we get
∫
(
θ
^
−
θ
)
f
∂
log
f
∂
θ
d
x
=
1.
{\displaystyle \int \left({\hat {\theta }}-\theta \right)f\,{\frac {\partial \log f}{\partial \theta }}\,dx=1.}
Factoring the integrand gives
∫
(
(
θ
^
−
θ
)
f
)
(
f
∂
log
f
∂
θ
)
d
x
=
1.
{\displaystyle \int \left(\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right)\left({\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right)\,dx=1.}
Squaring the expression in the integral, the Cauchy–Schwarz inequality yields
1
=
(
∫
[
(
θ
^
−
θ
)
f
]
⋅
[
f
∂
log
f
∂
θ
]
d
x
)
2
≤
[
∫
(
θ
^
−
θ
)
2
f
d
x
]
⋅
[
∫
(
∂
log
f
∂
θ
)
2
f
d
x
]
.
{\displaystyle 1={\biggl (}\int \left[\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right]\cdot \left[{\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right]\,dx{\biggr )}^{2}\leq \left[\int \left({\hat {\theta }}-\theta \right)^{2}f\,dx\right]\cdot \left[\int \left({\frac {\partial \log f}{\partial \theta }}\right)^{2}f\,dx\right].}
The second bracketed factor is defined to be the Fisher Information, while the first bracketed factor is the mean-squared error (MSE) of the estimator
θ
^
{\displaystyle {\hat {\theta }}}
. Since the estimator is unbiased, its MSE equals its variance. By rearranging, the inequality tells us that
Var
(
θ
^
)
≥
1
I
(
θ
)
.
{\displaystyle \operatorname {Var} ({\hat {\theta }})\geq {\frac {1}{{\mathcal {I}}\left(\theta \right)}}.}