8.8 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Algorithm IMED | 2/2 | https://en.wikipedia.org/wiki/Algorithm_IMED | reference | science, encyclopedia | 2026-05-05T14:37:18.993972+00:00 | kb-cron |
=== Lai–Robbins lower bound === In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order It states that for every consistent algorithm on the set
P
(
[
−
∞
,
1
]
)
{\displaystyle {\mathcal {P}}([-\infty ,1])}
— that is, an algorithm for which, for every
(
ν
1
,
…
,
ν
K
)
∈
P
(
[
−
∞
,
1
]
)
K
{\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}}
, the regret
R
T
{\displaystyle R_{T}}
is subpolynomial (i.e.
R
T
=
o
T
→
+
∞
(
T
α
)
{\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })}
for all
α
>
0
{\displaystyle \alpha >0}
) — we have:
R
T
≥
(
∑
a
:
μ
a
<
μ
∗
Δ
a
K
inf
(
ν
a
,
μ
∗
)
)
ln
T
−
Ω
T
→
+
∞
(
ln
ln
T
)
.
{\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).}
This bound is asymptotic (as
T
→
+
∞
{\displaystyle T\to +\infty }
) and gives a first-order lower bound of order
ln
T
{\displaystyle \ln T}
with the optimal constant in front of it and the second order in
−
Ω
(
ln
ln
T
)
{\displaystyle -\Omega (\ln \ln T)}
.
=== Regret bound for IMED === If the distribution of every arm
a
{\displaystyle a}
is
(
−
∞
,
1
]
{\displaystyle (-\infty ,1]}
( i.e.
ν
a
∈
P
(
[
−
∞
,
1
]
)
)
{\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))}
then the regret of the algorithm IMED verify
R
T
≤
(
∑
a
:
μ
a
<
μ
∗
Δ
a
K
inf
(
ν
a
,
μ
∗
)
)
ln
T
+
O
(
1
)
{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T+O(1)}
If all the distribution
ν
a
{\displaystyle \nu _{a}}
are bounded then it exists a constant
C
>
0
{\displaystyle C>0}
such that for
T
{\displaystyle T}
large enough the regret of IMED is upper bounded by
R
T
≤
(
∑
a
:
μ
a
<
μ
∗
Δ
a
K
inf
(
ν
a
,
μ
∗
)
)
ln
T
−
C
ln
ln
T
{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-C\ln \ln T}
== Computation time == The algorithm only requiere to compute the
K
i
n
f
{\displaystyle K_{inf}}
for suboptimal arms who are pulled
O
(
ln
T
)
{\displaystyle O(\ln T)}
times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the
K
i
n
f
{\displaystyle K_{inf}}
in the first order .
== See also == Multi-armed bandit Kullback–Leibler Upper Confidence Bound Confidence interval
== References ==