--- title: "Best arm identification" chunk: 3/3 source: "https://en.wikipedia.org/wiki/Best_arm_identification" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T14:37:26.259834+00:00" instance: "kb-cron" --- Update: N a t ← N a t + 1 {\displaystyle N_{a_{t}}\leftarrow N_{a_{t}}+1} Update empirical distribution ν ^ a t {\displaystyle {\hat {\nu }}_{a_{t}}} return a ^ T ⋆ ← arg ⁡ max a μ ^ a {\displaystyle {\hat {a}}_{T}^{\star }\leftarrow \arg \max _{a}{\hat {\mu }}_{a}} Unlike the fixed-confidence setting, there is no stopping rule because we stop at time T {\displaystyle T} . The algorithm is only base on a sampling rule. === Lower bound === The lower bound in the fixed-horizon setting gives the best confidence level we can reach with a given number of turns T {\displaystyle T} . It is expressed as an asymptotic result when T {\displaystyle T} is large. Lower bound theorem: For any algorithm, for any instance ν {\displaystyle \nu } , there exists a constant H ( ν ) {\displaystyle H(\nu )} that depends only on ν {\displaystyle \nu } such that the probability of error satisfies lim T → + ∞ P ( a ^ T ∉ A ⋆ ) ≥ exp ⁡ ( − T H ( ν ) ) {\displaystyle \lim _{T\to +\infty }\mathbb {P} ({\hat {a}}_{T}\notin {\mathcal {A}}^{\star })\geq \exp \left(-TH(\nu )\right)} This result shows that the error probability decays exponentially with the number of turns T {\displaystyle T} . === Simple regret === An alternative performance metric for fixed-horizon BAI is the simple regret, defined as r T := E [ μ ⋆ − μ a ^ T ∗ ] , {\displaystyle r_{T}:=\mathbb {E} [\mu ^{\star }-\mu _{{\hat {a}}_{T}^{*}}],} which measures the expected suboptimality of the returned arm. While P ( a ^ T ∗ ≠ a ⋆ ) {\displaystyle \mathbb {P} ({\hat {a}}_{T}^{*}\neq a^{\star })} treats all mistakes with the same cost, the simple regret r T {\displaystyle r_{T}} accounts for the gap between the optimal mean μ ∗ {\displaystyle \mu ^{*}} and the mean of the arm considered as the optimal arm by the algorithm μ a ^ T ∗ {\displaystyle \mu _{{\hat {a}}_{T}^{*}}} . This distinction is important in applications where the cost of choosing a suboptimal arm depends on how far it is from optimal. == See also == Multi-armed bandit Design of experiments Concentration inequality == References ==