--- title: "Fisher information" chunk: 2/8 source: "https://en.wikipedia.org/wiki/Fisher_information" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T09:50:15.726073+00:00" instance: "kb-cron" --- The partial derivative of f(X; θ) with respect to θ exists almost everywhere. (It can fail to exist on a null set, as long as this set does not depend on θ.) The integral of f(X; θ) can be differentiated under the integral sign with respect to θ. The support of f(X; θ) does not depend on θ. If θ is a vector then the regularity conditions must hold for every component of θ. It is easy to find an example of a density that does not satisfy the regularity conditions: The density of a Uniform(0, θ) variable fails to satisfy conditions 1 and 3. In this case, even though the Fisher information can be computed from the definition, it will not have the properties it is typically assumed to have. === In terms of likelihood === Because the likelihood of θ given X is always proportional to the probability f(X; θ), their logarithms necessarily differ by a constant that is independent of θ, and the derivatives of these logarithms with respect to θ are necessarily equal. Thus one can substitute in a log-likelihood l(θ; X) instead of log f(X; θ) in the definitions of Fisher Information. === Samples of any size === The value X can represent a single sample drawn from a single distribution or can represent a collection of samples drawn from a collection of distributions. If there are n samples and the corresponding n distributions are statistically independent then the Fisher information will necessarily be the sum of the single-sample Fisher information values, one for each single sample from its distribution. In particular, if the n distributions are independent and identically distributed then the Fisher information will necessarily be n times the Fisher information of a single sample from the common distribution. Stated in other words, the Fisher Information of i.i.d. observations of a sample of size n from a population is equal to the product of n and the Fisher Information of a single observation from the same population. === Informal derivation of the Cramér–Rao bound === The Cramér–Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any unbiased estimator of θ. Van Trees (1968) and Frieden (2004) provide the following method of deriving the Cramér–Rao bound, a result which describes use of the Fisher information. Informally, we begin by considering an unbiased estimator θ ^ ( X ) {\displaystyle {\hat {\theta }}(X)} . Mathematically, "unbiased" means that E ⁡ [ θ ^ ( X ) − θ | θ ] = ∫ ( θ ^ ( x ) − θ ) f ( x ; θ ) d x = 0 regardless of the value of θ . {\displaystyle \operatorname {E} \left[\left.{\hat {\theta }}(X)-\theta \,\,\right|\,\,\theta \right]=\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=0{\text{ regardless of the value of }}\theta .} This expression is zero independent of θ, so its partial derivative with respect to θ must also be zero. By the product rule, this partial derivative is also equal to 0 = ∂ ∂ θ ∫ ( θ ^ ( x ) − θ ) f ( x ; θ ) d x = ∫ ( θ ^ ( x ) − θ ) ∂ f ∂ θ d x − ∫ f d x . {\displaystyle 0={\frac {\partial }{\partial \theta }}\int \left({\hat {\theta }}(x)-\theta \right)\,f(x;\theta )\,dx=\int \left({\hat {\theta }}(x)-\theta \right){\frac {\partial f}{\partial \theta }}\,dx-\int f\,dx.} For each θ, the likelihood function is a probability density function, and therefore ∫ f d x = 1 {\displaystyle \int f\,dx=1} . By using the chain rule on the partial derivative of log ⁡ f {\displaystyle \log f} and then dividing and multiplying by f ( x ; θ ) {\displaystyle f(x;\theta )} , one can verify that ∂ f ∂ θ = f ∂ log ⁡ f ∂ θ . {\displaystyle {\frac {\partial f}{\partial \theta }}=f\,{\frac {\partial \log f}{\partial \theta }}.} Using these two facts in the above, we get ∫ ( θ ^ − θ ) f ∂ log ⁡ f ∂ θ d x = 1. {\displaystyle \int \left({\hat {\theta }}-\theta \right)f\,{\frac {\partial \log f}{\partial \theta }}\,dx=1.} Factoring the integrand gives ∫ ( ( θ ^ − θ ) f ) ( f ∂ log ⁡ f ∂ θ ) d x = 1. {\displaystyle \int \left(\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right)\left({\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right)\,dx=1.} Squaring the expression in the integral, the Cauchy–Schwarz inequality yields 1 = ( ∫ [ ( θ ^ − θ ) f ] ⋅ [ f ∂ log ⁡ f ∂ θ ] d x ) 2 ≤ [ ∫ ( θ ^ − θ ) 2 f d x ] ⋅ [ ∫ ( ∂ log ⁡ f ∂ θ ) 2 f d x ] . {\displaystyle 1={\biggl (}\int \left[\left({\hat {\theta }}-\theta \right){\sqrt {f}}\right]\cdot \left[{\sqrt {f}}\,{\frac {\partial \log f}{\partial \theta }}\right]\,dx{\biggr )}^{2}\leq \left[\int \left({\hat {\theta }}-\theta \right)^{2}f\,dx\right]\cdot \left[\int \left({\frac {\partial \log f}{\partial \theta }}\right)^{2}f\,dx\right].} The second bracketed factor is defined to be the Fisher Information, while the first bracketed factor is the mean-squared error (MSE) of the estimator θ ^ {\displaystyle {\hat {\theta }}} . Since the estimator is unbiased, its MSE equals its variance. By rearranging, the inequality tells us that Var ⁡ ( θ ^ ) ≥ 1 I ( θ ) . {\displaystyle \operatorname {Var} ({\hat {\theta }})\geq {\frac {1}{{\mathcal {I}}\left(\theta \right)}}.}