kb/data/en.wikipedia.org/wiki/Best_arm_identification-2.md

---
title: "Best arm identification"
chunk: 3/3
source: "https://en.wikipedia.org/wiki/Best_arm_identification"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T14:37:26.259834+00:00"
instance: "kb-cron"
---

    Update:


          N


              a

                t


        ←

          N


              a

                t


        +
        1


    {\displaystyle N_{a_{t}}\leftarrow N_{a_{t}}+1}


    Update empirical distribution


                ν
                ^


              a

                t


    {\displaystyle {\hat {\nu }}_{a_{t}}}


return


                a
                ^


            T


            ⋆


        ←
        arg
        ⁡

          max

            a


                μ
                ^


            a


    {\displaystyle {\hat {a}}_{T}^{\star }\leftarrow \arg \max _{a}{\hat {\mu }}_{a}}


Unlike the fixed-confidence setting, there is no stopping rule because we stop at time


        T


    {\displaystyle T}

. The algorithm is only base on a sampling rule.

=== Lower bound ===
The lower bound in the fixed-horizon setting gives the best confidence level we can reach with a given number of turns


        T


    {\displaystyle T}

. It is expressed as an asymptotic result when


        T


    {\displaystyle T}

 is large.
Lower bound theorem: For any algorithm, for any instance


        ν


    {\displaystyle \nu }

, there exists a constant


        H
        (
        ν
        )


    {\displaystyle H(\nu )}

 that depends only on


        ν


    {\displaystyle \nu }

 such that the probability of error satisfies


          lim

            T
            →
            +
            ∞


          P

        (


                a
                ^


            T


        ∉


              A


            ⋆


        )
        ≥
        exp
        ⁡

          (

            −
            T
            H
            (
            ν
            )

          )


    {\displaystyle \lim _{T\to +\infty }\mathbb {P} ({\hat {a}}_{T}\notin {\mathcal {A}}^{\star })\geq \exp \left(-TH(\nu )\right)}


This result shows that the error probability decays exponentially with the number of turns


        T


    {\displaystyle T}

.

=== Simple regret ===
An alternative performance metric for fixed-horizon BAI is the simple regret, defined as


          r

            T


        :=

          E

        [

          μ

            ⋆


        −

          μ


                    a
                    ^


                T


                ∗


        ]
        ,


    {\displaystyle r_{T}:=\mathbb {E} [\mu ^{\star }-\mu _{{\hat {a}}_{T}^{*}}],}


which measures the expected suboptimality of the returned arm.
While


          P

        (


                a
                ^


            T


            ∗


        ≠

          a

            ⋆


        )


    {\displaystyle \mathbb {P} ({\hat {a}}_{T}^{*}\neq a^{\star })}

 treats all mistakes with the same cost, the simple regret


          r

            T


    {\displaystyle r_{T}}

 accounts for the gap between the optimal mean


          μ

            ∗


    {\displaystyle \mu ^{*}}

 and the mean of the arm considered as the optimal arm by the algorithm


          μ


                    a
                    ^


                T


                ∗


    {\displaystyle \mu _{{\hat {a}}_{T}^{*}}}

. This distinction is important in applications where the cost of choosing a suboptimal arm depends on how far it is from optimal.

== See also ==
Multi-armed bandit
Design of experiments
Concentration inequality

== References ==