--- title: "Fisher information" chunk: 3/8 source: "https://en.wikipedia.org/wiki/Fisher_information" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T09:50:15.726073+00:00" instance: "kb-cron" --- In other words, the precision to which we can estimate θ is fundamentally limited by the Fisher information of the likelihood function. Alternatively, the same conclusion can be obtained directly from the Cauchy–Schwarz inequality for random variables, | Cov ⁡ ( A , B ) | 2 ≤ Var ⁡ ( A ) Var ⁡ ( B ) {\displaystyle |\operatorname {Cov} (A,B)|^{2}\leq \operatorname {Var} (A)\operatorname {Var} (B)} , applied to the random variables θ ^ ( X ) {\displaystyle {\hat {\theta }}(X)} and ∂ θ log ⁡ f ( X ; θ ) {\displaystyle \partial _{\theta }\log f(X;\theta )} , and observing that for unbiased estimators we have Cov ⁡ [ θ ^ ( X ) , ∂ θ log ⁡ f ( X ; θ ) ] = ∫ θ ^ ( x ) ∂ θ f ( x ; θ ) d x = ∂ θ E ⁡ [ θ ^ ] = 1. {\displaystyle \operatorname {Cov} [{\hat {\theta }}(X),\partial _{\theta }\log f(X;\theta )]=\int {\hat {\theta }}(x)\,\partial _{\theta }f(x;\theta )\,dx=\partial _{\theta }\operatorname {E} [{\hat {\theta }}]=1.} == Examples == === Single-parameter Bernoulli experiment === A Bernoulli trial is a random variable with two possible outcomes, 0 and 1, with 1 having a probability of θ. The outcome can be thought of as determined by the toss of a biased coin, with the probability of heads (1) being θ and the probability of tails (0) being 1 − θ. Let X be a Bernoulli trial of one sample from the distribution. The Fisher information contained in X may be calculated to be: I ( θ ) = − E ⁡ [ ∂ 2 ∂ θ 2 log ⁡ ( θ X ( 1 − θ ) 1 − X ) | θ ] = − E ⁡ [ ∂ 2 ∂ θ 2 ( X log ⁡ θ + ( 1 − X ) log ⁡ ( 1 − θ ) ) | θ ] = E ⁡ [ X θ 2 + 1 − X ( 1 − θ ) 2 | θ ] = θ θ 2 + 1 − θ ( 1 − θ ) 2 = 1 θ ( 1 − θ ) . {\displaystyle {\begin{aligned}{\mathcal {I}}(\theta )&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log \left(\theta ^{X}(1-\theta )^{1-X}\right)\right|\theta \right]\\[5pt]&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left(X\log \theta +(1-X)\log(1-\theta )\right)\,\,\right|\,\,\theta \right]\\[5pt]&=\operatorname {E} \left[\left.{\frac {X}{\theta ^{2}}}+{\frac {1-X}{(1-\theta )^{2}}}\,\,\right|\,\,\theta \right]\\[5pt]&={\frac {\theta }{\theta ^{2}}}+{\frac {1-\theta }{(1-\theta )^{2}}}\\[5pt]&={\frac {1}{\theta (1-\theta )}}.\end{aligned}}} Because Fisher information is additive, the Fisher information contained in n independent Bernoulli trials is therefore I ( θ ) = n θ ( 1 − θ ) . {\displaystyle {\mathcal {I}}(\theta )={\frac {n}{\theta (1-\theta )}}.} If x i {\displaystyle x_{i}} is one of the 2 n {\displaystyle 2^{n}} possible outcomes of n independent Bernoulli trials and x i j {\displaystyle x_{ij}} is the j th outcome of the i th trial, then the probability of x i {\displaystyle x_{i}} is given by p ( x i , θ ) = ∏ j = 0 n θ x i j ( 1 − θ ) x i j {\displaystyle p(x_{i},\theta )=\prod _{j=0}^{n}\theta ^{x_{ij}}(1-\theta )^{x_{ij}}} The sample mean of the i th trial is μ i = ( 1 / n ) ∑ j = 1 n x i j {\displaystyle \mu _{i}=(1/n)\sum _{j=1}^{n}x_{ij}} . The expected value of the sample mean (over the sampling distribution) is E ( μ ) = ∑ x i μ i p ( x i , θ ) = θ , {\displaystyle E(\mu )=\sum _{x_{i}}\mu _{i}\,p(x_{i},\theta )=\theta ,} where the sum is over all 2 n {\displaystyle 2^{n}} possible trial outcomes. The expected value of the square of the sample mean is E ( μ 2 ) = ∑ x i μ i 2 p ( x i , θ ) = ( 1 + ( n − 1 ) θ ) θ n {\displaystyle E(\mu ^{2})=\sum _{x_{i}}\mu _{i}^{2}\,p(x_{i},\theta )={\frac {(1+(n-1)\theta )\theta }{n}}} so the variance in the value of the mean is E ( μ 2 ) − E ( μ ) 2 = θ ( 1 − θ ) n {\displaystyle E(\mu ^{2})-E(\mu )^{2}={\frac {\theta (1-\theta )}{n}}} It is seen that the Fisher information is the reciprocal of the variance of the mean number of successes in n Bernoulli trials. This is generally true. In this case, the Cramér–Rao bound is an equality.