kb/data/en.wikipedia.org/wiki/Fisher_information-2.md

---
title: "Fisher information"
chunk: 3/8
source: "https://en.wikipedia.org/wiki/Fisher_information"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:15.726073+00:00"
instance: "kb-cron"
---

In other words, the precision to which we can estimate θ is fundamentally limited by the Fisher information of the likelihood function.
Alternatively, the same conclusion can be obtained directly from the Cauchy–Schwarz inequality for random variables,


          |

        Cov
        ⁡
        (
        A
        ,
        B
        )


            |


            2


        ≤
        Var
        ⁡
        (
        A
        )
        Var
        ⁡
        (
        B
        )


    {\displaystyle |\operatorname {Cov} (A,B)|^{2}\leq \operatorname {Var} (A)\operatorname {Var} (B)}

, applied to the random variables


              θ
              ^


        (
        X
        )


    {\displaystyle {\hat {\theta }}(X)}

 and


          ∂

            θ


        log
        ⁡
        f
        (
        X
        ;
        θ
        )


    {\displaystyle \partial _{\theta }\log f(X;\theta )}

, and observing that for unbiased estimators we have


        Cov
        ⁡
        [


              θ
              ^


        (
        X
        )
        ,

          ∂

            θ


        log
        ⁡
        f
        (
        X
        ;
        θ
        )
        ]
        =
        ∫


              θ
              ^


        (
        x
        )


          ∂

            θ


        f
        (
        x
        ;
        θ
        )

        d
        x
        =

          ∂

            θ


        E
        ⁡
        [


              θ
              ^


        ]
        =
        1.


    {\displaystyle \operatorname {Cov} [{\hat {\theta }}(X),\partial _{\theta }\log f(X;\theta )]=\int {\hat {\theta }}(x)\,\partial _{\theta }f(x;\theta )\,dx=\partial _{\theta }\operatorname {E} [{\hat {\theta }}]=1.}


== Examples ==

=== Single-parameter Bernoulli experiment ===
A Bernoulli trial is a random variable with two possible outcomes, 0 and 1, with 1 having a probability of θ. The outcome can be thought of as determined by the toss of a biased coin, with the probability of heads (1) being θ and the probability of tails (0) being 1 − θ.
Let X be a Bernoulli trial of one sample from the distribution. The Fisher information contained in X may be calculated to be:


                    I


                (
                θ
                )


                =
                −
                E
                ⁡

                  [


                              ∂

                                2


                              ∂

                                θ

                                  2


                        log
                        ⁡

                          (


                              θ

                                X


                            (
                            1
                            −
                            θ

                              )

                                1
                                −
                                X


                          )


                      |

                    θ

                  ]


                =
                −
                E
                ⁡

                  [


                              ∂

                                2


                              ∂

                                θ

                                  2


                          (

                            X
                            log
                            ⁡
                            θ
                            +
                            (
                            1
                            −
                            X
                            )
                            log
                            ⁡
                            (
                            1
                            −
                            θ
                            )

                          )


                      |


                    θ

                  ]


                =
                E
                ⁡

                  [


                            X

                              θ

                                2


                        +


                              1
                              −
                              X


                              (
                              1
                              −
                              θ

                                )

                                  2


                      |


                    θ

                  ]


                =


                    θ

                      θ

                        2


                +


                      1
                      −
                      θ


                      (
                      1
                      −
                      θ

                        )

                          2


                =


                    1

                      θ
                      (
                      1
                      −
                      θ
                      )


                .


    {\displaystyle {\begin{aligned}{\mathcal {I}}(\theta )&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log \left(\theta ^{X}(1-\theta )^{1-X}\right)\right|\theta \right]\\[5pt]&=-\operatorname {E} \left[\left.{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left(X\log \theta +(1-X)\log(1-\theta )\right)\,\,\right|\,\,\theta \right]\\[5pt]&=\operatorname {E} \left[\left.{\frac {X}{\theta ^{2}}}+{\frac {1-X}{(1-\theta )^{2}}}\,\,\right|\,\,\theta \right]\\[5pt]&={\frac {\theta }{\theta ^{2}}}+{\frac {1-\theta }{(1-\theta )^{2}}}\\[5pt]&={\frac {1}{\theta (1-\theta )}}.\end{aligned}}}


Because Fisher information is additive, the Fisher information contained in n independent Bernoulli trials is therefore


            I


        (
        θ
        )
        =


            n

              θ
              (
              1
              −
              θ
              )


        .


    {\displaystyle {\mathcal {I}}(\theta )={\frac {n}{\theta (1-\theta )}}.}


If


          x

            i


    {\displaystyle x_{i}}

 is one of the


          2

            n


    {\displaystyle 2^{n}}

 possible outcomes of n independent Bernoulli trials and


          x

            i
            j


    {\displaystyle x_{ij}}

 is the j th outcome of the i th trial, then the probability of


          x

            i


    {\displaystyle x_{i}}

 is given by


        p
        (

          x

            i


        ,
        θ
        )
        =

          ∏

            j
            =
            0


            n


          θ


              x

                i
                j


        (
        1
        −
        θ

          )


              x

                i
                j


    {\displaystyle p(x_{i},\theta )=\prod _{j=0}^{n}\theta ^{x_{ij}}(1-\theta )^{x_{ij}}}


The sample mean of the i th trial is


          μ

            i


        =
        (
        1

          /

        n
        )

          ∑

            j
            =
            1


            n


          x

            i
            j


    {\displaystyle \mu _{i}=(1/n)\sum _{j=1}^{n}x_{ij}}

. The expected value of the sample mean (over the sampling distribution) is


        E
        (
        μ
        )
        =

          ∑


              x

                i


          μ

            i


        p
        (

          x

            i


        ,
        θ
        )
        =
        θ
        ,


    {\displaystyle E(\mu )=\sum _{x_{i}}\mu _{i}\,p(x_{i},\theta )=\theta ,}


where the sum is over all


          2

            n


    {\displaystyle 2^{n}}

 possible trial outcomes. The expected value of the square of the sample mean is


        E
        (

          μ

            2


        )
        =

          ∑


              x

                i


          μ

            i


            2


        p
        (

          x

            i


        ,
        θ
        )
        =


              (
              1
              +
              (
              n
              −
              1
              )
              θ
              )
              θ

            n


    {\displaystyle E(\mu ^{2})=\sum _{x_{i}}\mu _{i}^{2}\,p(x_{i},\theta )={\frac {(1+(n-1)\theta )\theta }{n}}}


so the variance in the value of the mean is


        E
        (

          μ

            2


        )
        −
        E
        (
        μ

          )

            2


        =


              θ
              (
              1
              −
              θ
              )

            n


    {\displaystyle E(\mu ^{2})-E(\mu )^{2}={\frac {\theta (1-\theta )}{n}}}


It is seen that the Fisher information is the reciprocal of the variance of the mean number of successes in n Bernoulli trials. This is generally true.  In this case, the Cramér–Rao bound is an equality.