15 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Fisher information | 3/8 | https://en.wikipedia.org/wiki/Fisher_information | reference | science, encyclopedia | 2026-05-05T09:50:15.726073+00:00 | kb-cron |
In other words, the precision to which we can estimate θ is fundamentally limited by the Fisher information of the likelihood function. Alternatively, the same conclusion can be obtained directly from the Cauchy–Schwarz inequality for random variables,
|
Cov
(
A
,
B
)
|
2
≤
Var
(
A
)
Var
(
B
)
{\displaystyle |\operatorname {Cov} (A,B)|^{2}\leq \operatorname {Var} (A)\operatorname {Var} (B)}
, applied to the random variables
θ
^
(
X
)
{\displaystyle {\hat {\theta }}(X)}
and
∂
θ
log
f
(
X
;
θ
)
{\displaystyle \partial _{\theta }\log f(X;\theta )}
, and observing that for unbiased estimators we have
Cov
[
θ
^
(
X
)
,
∂
θ
log
f
(
X
;
θ
)
]
=
∫
θ
^
(
x
)
∂
θ
f
(
x
;
θ
)
d
x
=
∂
θ
E
[
θ
^
]
=
1.
{\displaystyle \operatorname {Cov} [{\hat {\theta }}(X),\partial _{\theta }\log f(X;\theta )]=\int {\hat {\theta }}(x)\,\partial _{\theta }f(x;\theta )\,dx=\partial _{\theta }\operatorname {E} [{\hat {\theta }}]=1.}
== Examples ==
=== Single-parameter Bernoulli experiment === A Bernoulli trial is a random variable with two possible outcomes, 0 and 1, with 1 having a probability of θ. The outcome can be thought of as determined by the toss of a biased coin, with the probability of heads (1) being θ and the probability of tails (0) being 1 − θ. Let X be a Bernoulli trial of one sample from the distribution. The Fisher information contained in X may be calculated to be:
I
(
θ
)
=
−
E
[
∂
2
∂
θ
2
log
(
θ
X
(
1
−
θ
)
1
−
X
)
|
θ
]
=
−
E
[
∂
2
∂
θ
2
(
X
log
θ
+
(
1
−
X
)
log
(
1
−
θ
)
)
|
θ
]
=
E
[
X
θ
2
+
1
−
X
(
1
−
θ
)
2
|
θ
]
=
θ
θ
2
+
1
−
θ
(
1
−
θ
)
2
=
1
θ
(
1
−
θ
)
.
{\displaystyle {\begin{aligned}{\mathcal {I}}(\theta )&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log \left(\theta ^{X}(1-\theta )^{1-X}\right)\right|\theta \right]\\[5pt]&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left(X\log \theta +(1-X)\log(1-\theta )\right)\,\,\right|\,\,\theta \right]\\[5pt]&=\operatorname {E} \left[\left.{\frac {X}{\theta ^{2}}}+{\frac {1-X}{(1-\theta )^{2}}}\,\,\right|\,\,\theta \right]\\[5pt]&={\frac {\theta }{\theta ^{2}}}+{\frac {1-\theta }{(1-\theta )^{2}}}\\[5pt]&={\frac {1}{\theta (1-\theta )}}.\end{aligned}}}
Because Fisher information is additive, the Fisher information contained in n independent Bernoulli trials is therefore
I
(
θ
)
=
n
θ
(
1
−
θ
)
.
{\displaystyle {\mathcal {I}}(\theta )={\frac {n}{\theta (1-\theta )}}.}
If
x
i
{\displaystyle x_{i}}
is one of the
2
n
{\displaystyle 2^{n}}
possible outcomes of n independent Bernoulli trials and
x
i
j
{\displaystyle x_{ij}}
is the j th outcome of the i th trial, then the probability of
x
i
{\displaystyle x_{i}}
is given by
p
(
x
i
,
θ
)
=
∏
j
=
0
n
θ
x
i
j
(
1
−
θ
)
x
i
j
{\displaystyle p(x_{i},\theta )=\prod _{j=0}^{n}\theta ^{x_{ij}}(1-\theta )^{x_{ij}}}
The sample mean of the i th trial is
μ
i
=
(
1
/
n
)
∑
j
=
1
n
x
i
j
{\displaystyle \mu _{i}=(1/n)\sum _{j=1}^{n}x_{ij}}
. The expected value of the sample mean (over the sampling distribution) is
E
(
μ
)
=
∑
x
i
μ
i
p
(
x
i
,
θ
)
=
θ
,
{\displaystyle E(\mu )=\sum _{x_{i}}\mu _{i}\,p(x_{i},\theta )=\theta ,}
where the sum is over all
2
n
{\displaystyle 2^{n}}
possible trial outcomes. The expected value of the square of the sample mean is
E
(
μ
2
)
=
∑
x
i
μ
i
2
p
(
x
i
,
θ
)
=
(
1
+
(
n
−
1
)
θ
)
θ
n
{\displaystyle E(\mu ^{2})=\sum _{x_{i}}\mu _{i}^{2}\,p(x_{i},\theta )={\frac {(1+(n-1)\theta )\theta }{n}}}
so the variance in the value of the mean is
E
(
μ
2
)
−
E
(
μ
)
2
=
θ
(
1
−
θ
)
n
{\displaystyle E(\mu ^{2})-E(\mu )^{2}={\frac {\theta (1-\theta )}{n}}}
It is seen that the Fisher information is the reciprocal of the variance of the mean number of successes in n Bernoulli trials. This is generally true. In this case, the Cramér–Rao bound is an equality.