407 lines
5.7 KiB
Markdown
407 lines
5.7 KiB
Markdown
---
|
||
title: "Best arm identification"
|
||
chunk: 3/3
|
||
source: "https://en.wikipedia.org/wiki/Best_arm_identification"
|
||
category: "reference"
|
||
tags: "science, encyclopedia"
|
||
date_saved: "2026-05-05T14:37:26.259834+00:00"
|
||
instance: "kb-cron"
|
||
---
|
||
|
||
Update:
|
||
|
||
|
||
|
||
|
||
N
|
||
|
||
|
||
a
|
||
|
||
t
|
||
|
||
|
||
|
||
|
||
←
|
||
|
||
N
|
||
|
||
|
||
a
|
||
|
||
t
|
||
|
||
|
||
|
||
|
||
+
|
||
1
|
||
|
||
|
||
{\displaystyle N_{a_{t}}\leftarrow N_{a_{t}}+1}
|
||
|
||
|
||
Update empirical distribution
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
ν
|
||
^
|
||
|
||
|
||
|
||
|
||
|
||
a
|
||
|
||
t
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\hat {\nu }}_{a_{t}}}
|
||
|
||
|
||
return
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
a
|
||
^
|
||
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
⋆
|
||
|
||
|
||
←
|
||
arg
|
||
|
||
|
||
max
|
||
|
||
a
|
||
|
||
|
||
|
||
|
||
|
||
|
||
μ
|
||
^
|
||
|
||
|
||
|
||
|
||
a
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\hat {a}}_{T}^{\star }\leftarrow \arg \max _{a}{\hat {\mu }}_{a}}
|
||
|
||
|
||
Unlike the fixed-confidence setting, there is no stopping rule because we stop at time
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle T}
|
||
|
||
. The algorithm is only base on a sampling rule.
|
||
|
||
=== Lower bound ===
|
||
The lower bound in the fixed-horizon setting gives the best confidence level we can reach with a given number of turns
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle T}
|
||
|
||
. It is expressed as an asymptotic result when
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle T}
|
||
|
||
is large.
|
||
Lower bound theorem: For any algorithm, for any instance
|
||
|
||
|
||
|
||
ν
|
||
|
||
|
||
{\displaystyle \nu }
|
||
|
||
, there exists a constant
|
||
|
||
|
||
|
||
H
|
||
(
|
||
ν
|
||
)
|
||
|
||
|
||
{\displaystyle H(\nu )}
|
||
|
||
that depends only on
|
||
|
||
|
||
|
||
ν
|
||
|
||
|
||
{\displaystyle \nu }
|
||
|
||
such that the probability of error satisfies
|
||
|
||
|
||
|
||
|
||
|
||
lim
|
||
|
||
T
|
||
→
|
||
+
|
||
∞
|
||
|
||
|
||
|
||
P
|
||
|
||
(
|
||
|
||
|
||
|
||
|
||
a
|
||
^
|
||
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
∉
|
||
|
||
|
||
|
||
A
|
||
|
||
|
||
|
||
⋆
|
||
|
||
|
||
)
|
||
≥
|
||
exp
|
||
|
||
|
||
(
|
||
|
||
−
|
||
T
|
||
H
|
||
(
|
||
ν
|
||
)
|
||
|
||
)
|
||
|
||
|
||
|
||
{\displaystyle \lim _{T\to +\infty }\mathbb {P} ({\hat {a}}_{T}\notin {\mathcal {A}}^{\star })\geq \exp \left(-TH(\nu )\right)}
|
||
|
||
|
||
This result shows that the error probability decays exponentially with the number of turns
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle T}
|
||
|
||
.
|
||
|
||
=== Simple regret ===
|
||
An alternative performance metric for fixed-horizon BAI is the simple regret, defined as
|
||
|
||
|
||
|
||
|
||
|
||
r
|
||
|
||
T
|
||
|
||
|
||
:=
|
||
|
||
E
|
||
|
||
[
|
||
|
||
μ
|
||
|
||
⋆
|
||
|
||
|
||
−
|
||
|
||
μ
|
||
|
||
|
||
|
||
|
||
|
||
a
|
||
^
|
||
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
∗
|
||
|
||
|
||
|
||
|
||
]
|
||
,
|
||
|
||
|
||
{\displaystyle r_{T}:=\mathbb {E} [\mu ^{\star }-\mu _{{\hat {a}}_{T}^{*}}],}
|
||
|
||
|
||
which measures the expected suboptimality of the returned arm.
|
||
While
|
||
|
||
|
||
|
||
|
||
P
|
||
|
||
(
|
||
|
||
|
||
|
||
|
||
a
|
||
^
|
||
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
∗
|
||
|
||
|
||
≠
|
||
|
||
a
|
||
|
||
⋆
|
||
|
||
|
||
)
|
||
|
||
|
||
{\displaystyle \mathbb {P} ({\hat {a}}_{T}^{*}\neq a^{\star })}
|
||
|
||
treats all mistakes with the same cost, the simple regret
|
||
|
||
|
||
|
||
|
||
r
|
||
|
||
T
|
||
|
||
|
||
|
||
|
||
{\displaystyle r_{T}}
|
||
|
||
accounts for the gap between the optimal mean
|
||
|
||
|
||
|
||
|
||
μ
|
||
|
||
∗
|
||
|
||
|
||
|
||
|
||
{\displaystyle \mu ^{*}}
|
||
|
||
and the mean of the arm considered as the optimal arm by the algorithm
|
||
|
||
|
||
|
||
|
||
μ
|
||
|
||
|
||
|
||
|
||
|
||
a
|
||
^
|
||
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
∗
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle \mu _{{\hat {a}}_{T}^{*}}}
|
||
|
||
. This distinction is important in applications where the cost of choosing a suboptimal arm depends on how far it is from optimal.
|
||
|
||
== See also ==
|
||
Multi-armed bandit
|
||
Design of experiments
|
||
Concentration inequality
|
||
|
||
== References == |