kb/data/en.wikipedia.org/wiki/Fisher_information-2.md

829 lines
15 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Fisher information"
chunk: 3/8
source: "https://en.wikipedia.org/wiki/Fisher_information"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:15.726073+00:00"
instance: "kb-cron"
---
In other words, the precision to which we can estimate θ is fundamentally limited by the Fisher information of the likelihood function.
Alternatively, the same conclusion can be obtained directly from the CauchySchwarz inequality for random variables,
|
Cov
(
A
,
B
)
|
2
Var
(
A
)
Var
(
B
)
{\displaystyle |\operatorname {Cov} (A,B)|^{2}\leq \operatorname {Var} (A)\operatorname {Var} (B)}
, applied to the random variables
θ
^
(
X
)
{\displaystyle {\hat {\theta }}(X)}
and
θ
log
f
(
X
;
θ
)
{\displaystyle \partial _{\theta }\log f(X;\theta )}
, and observing that for unbiased estimators we have
Cov
[
θ
^
(
X
)
,
θ
log
f
(
X
;
θ
)
]
=
θ
^
(
x
)
θ
f
(
x
;
θ
)
d
x
=
θ
E
[
θ
^
]
=
1.
{\displaystyle \operatorname {Cov} [{\hat {\theta }}(X),\partial _{\theta }\log f(X;\theta )]=\int {\hat {\theta }}(x)\,\partial _{\theta }f(x;\theta )\,dx=\partial _{\theta }\operatorname {E} [{\hat {\theta }}]=1.}
== Examples ==
=== Single-parameter Bernoulli experiment ===
A Bernoulli trial is a random variable with two possible outcomes, 0 and 1, with 1 having a probability of θ. The outcome can be thought of as determined by the toss of a biased coin, with the probability of heads (1) being θ and the probability of tails (0) being 1 θ.
Let X be a Bernoulli trial of one sample from the distribution. The Fisher information contained in X may be calculated to be:
I
(
θ
)
=
E
[
2
θ
2
log
(
θ
X
(
1
θ
)
1
X
)
|
θ
]
=
E
[
2
θ
2
(
X
log
θ
+
(
1
X
)
log
(
1
θ
)
)
|
θ
]
=
E
[
X
θ
2
+
1
X
(
1
θ
)
2
|
θ
]
=
θ
θ
2
+
1
θ
(
1
θ
)
2
=
1
θ
(
1
θ
)
.
{\displaystyle {\begin{aligned}{\mathcal {I}}(\theta )&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log \left(\theta ^{X}(1-\theta )^{1-X}\right)\right|\theta \right]\\[5pt]&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left(X\log \theta +(1-X)\log(1-\theta )\right)\,\,\right|\,\,\theta \right]\\[5pt]&=\operatorname {E} \left[\left.{\frac {X}{\theta ^{2}}}+{\frac {1-X}{(1-\theta )^{2}}}\,\,\right|\,\,\theta \right]\\[5pt]&={\frac {\theta }{\theta ^{2}}}+{\frac {1-\theta }{(1-\theta )^{2}}}\\[5pt]&={\frac {1}{\theta (1-\theta )}}.\end{aligned}}}
Because Fisher information is additive, the Fisher information contained in n independent Bernoulli trials is therefore
I
(
θ
)
=
n
θ
(
1
θ
)
.
{\displaystyle {\mathcal {I}}(\theta )={\frac {n}{\theta (1-\theta )}}.}
If
x
i
{\displaystyle x_{i}}
is one of the
2
n
{\displaystyle 2^{n}}
possible outcomes of n independent Bernoulli trials and
x
i
j
{\displaystyle x_{ij}}
is the j th outcome of the i th trial, then the probability of
x
i
{\displaystyle x_{i}}
is given by
p
(
x
i
,
θ
)
=
j
=
0
n
θ
x
i
j
(
1
θ
)
x
i
j
{\displaystyle p(x_{i},\theta )=\prod _{j=0}^{n}\theta ^{x_{ij}}(1-\theta )^{x_{ij}}}
The sample mean of the i th trial is
μ
i
=
(
1
/
n
)
j
=
1
n
x
i
j
{\displaystyle \mu _{i}=(1/n)\sum _{j=1}^{n}x_{ij}}
. The expected value of the sample mean (over the sampling distribution) is
E
(
μ
)
=
x
i
μ
i
p
(
x
i
,
θ
)
=
θ
,
{\displaystyle E(\mu )=\sum _{x_{i}}\mu _{i}\,p(x_{i},\theta )=\theta ,}
where the sum is over all
2
n
{\displaystyle 2^{n}}
possible trial outcomes. The expected value of the square of the sample mean is
E
(
μ
2
)
=
x
i
μ
i
2
p
(
x
i
,
θ
)
=
(
1
+
(
n
1
)
θ
)
θ
n
{\displaystyle E(\mu ^{2})=\sum _{x_{i}}\mu _{i}^{2}\,p(x_{i},\theta )={\frac {(1+(n-1)\theta )\theta }{n}}}
so the variance in the value of the mean is
E
(
μ
2
)
E
(
μ
)
2
=
θ
(
1
θ
)
n
{\displaystyle E(\mu ^{2})-E(\mu )^{2}={\frac {\theta (1-\theta )}{n}}}
It is seen that the Fisher information is the reciprocal of the variance of the mean number of successes in n Bernoulli trials. This is generally true. In this case, the CramérRao bound is an equality.