597 lines
8.8 KiB
Markdown
597 lines
8.8 KiB
Markdown
---
|
||
title: "Algorithm IMED"
|
||
chunk: 2/2
|
||
source: "https://en.wikipedia.org/wiki/Algorithm_IMED"
|
||
category: "reference"
|
||
tags: "science, encyclopedia"
|
||
date_saved: "2026-05-05T14:37:18.993972+00:00"
|
||
instance: "kb-cron"
|
||
---
|
||
|
||
=== Lai–Robbins lower bound ===
|
||
In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order
|
||
It states that for every consistent algorithm on the set
|
||
|
||
|
||
|
||
|
||
|
||
P
|
||
|
||
|
||
(
|
||
[
|
||
−
|
||
∞
|
||
,
|
||
1
|
||
]
|
||
)
|
||
|
||
|
||
{\displaystyle {\mathcal {P}}([-\infty ,1])}
|
||
|
||
— that is, an algorithm for which, for every
|
||
|
||
|
||
|
||
(
|
||
|
||
ν
|
||
|
||
1
|
||
|
||
|
||
,
|
||
…
|
||
,
|
||
|
||
ν
|
||
|
||
K
|
||
|
||
|
||
)
|
||
∈
|
||
|
||
|
||
P
|
||
|
||
|
||
(
|
||
[
|
||
−
|
||
∞
|
||
,
|
||
1
|
||
]
|
||
|
||
)
|
||
|
||
K
|
||
|
||
|
||
|
||
|
||
{\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}}
|
||
|
||
, the regret
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
T
|
||
|
||
|
||
|
||
|
||
{\displaystyle R_{T}}
|
||
|
||
is subpolynomial (i.e.
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
T
|
||
|
||
|
||
=
|
||
|
||
o
|
||
|
||
T
|
||
→
|
||
+
|
||
∞
|
||
|
||
|
||
(
|
||
|
||
T
|
||
|
||
α
|
||
|
||
|
||
)
|
||
|
||
|
||
{\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })}
|
||
|
||
for all
|
||
|
||
|
||
|
||
α
|
||
>
|
||
0
|
||
|
||
|
||
{\displaystyle \alpha >0}
|
||
|
||
) — we have:
|
||
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
T
|
||
|
||
|
||
≥
|
||
|
||
(
|
||
|
||
|
||
∑
|
||
|
||
a
|
||
:
|
||
|
||
μ
|
||
|
||
a
|
||
|
||
|
||
<
|
||
|
||
μ
|
||
|
||
∗
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Δ
|
||
|
||
a
|
||
|
||
|
||
|
||
|
||
|
||
|
||
K
|
||
|
||
|
||
|
||
inf
|
||
|
||
|
||
(
|
||
|
||
ν
|
||
|
||
a
|
||
|
||
|
||
,
|
||
|
||
μ
|
||
|
||
∗
|
||
|
||
|
||
)
|
||
|
||
|
||
|
||
|
||
)
|
||
|
||
ln
|
||
|
||
T
|
||
−
|
||
|
||
Ω
|
||
|
||
T
|
||
→
|
||
+
|
||
∞
|
||
|
||
|
||
(
|
||
ln
|
||
|
||
ln
|
||
|
||
T
|
||
)
|
||
.
|
||
|
||
|
||
{\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).}
|
||
|
||
|
||
This bound is asymptotic (as
|
||
|
||
|
||
|
||
T
|
||
→
|
||
+
|
||
∞
|
||
|
||
|
||
{\displaystyle T\to +\infty }
|
||
|
||
) and gives a first-order lower bound of order
|
||
|
||
|
||
|
||
ln
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle \ln T}
|
||
|
||
with the optimal constant in front of it and the second order in
|
||
|
||
|
||
|
||
−
|
||
Ω
|
||
(
|
||
ln
|
||
|
||
ln
|
||
|
||
T
|
||
)
|
||
|
||
|
||
{\displaystyle -\Omega (\ln \ln T)}
|
||
|
||
.
|
||
|
||
=== Regret bound for IMED ===
|
||
If the distribution of every arm
|
||
|
||
|
||
|
||
a
|
||
|
||
|
||
{\displaystyle a}
|
||
|
||
is
|
||
|
||
|
||
|
||
(
|
||
−
|
||
∞
|
||
,
|
||
1
|
||
]
|
||
|
||
|
||
{\displaystyle (-\infty ,1]}
|
||
|
||
( i.e.
|
||
|
||
|
||
|
||
|
||
ν
|
||
|
||
a
|
||
|
||
|
||
∈
|
||
|
||
|
||
P
|
||
|
||
|
||
(
|
||
[
|
||
−
|
||
∞
|
||
,
|
||
1
|
||
]
|
||
)
|
||
)
|
||
|
||
|
||
{\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))}
|
||
|
||
then the regret of the algorithm IMED verify
|
||
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
T
|
||
|
||
|
||
≤
|
||
|
||
(
|
||
|
||
|
||
∑
|
||
|
||
a
|
||
:
|
||
|
||
μ
|
||
|
||
a
|
||
|
||
|
||
<
|
||
|
||
μ
|
||
|
||
∗
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Δ
|
||
|
||
a
|
||
|
||
|
||
|
||
|
||
|
||
|
||
K
|
||
|
||
|
||
|
||
inf
|
||
|
||
|
||
(
|
||
|
||
ν
|
||
|
||
a
|
||
|
||
|
||
,
|
||
|
||
μ
|
||
|
||
∗
|
||
|
||
|
||
)
|
||
|
||
|
||
|
||
|
||
)
|
||
|
||
ln
|
||
|
||
T
|
||
+
|
||
O
|
||
(
|
||
1
|
||
)
|
||
|
||
|
||
{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T+O(1)}
|
||
|
||
|
||
If all the distribution
|
||
|
||
|
||
|
||
|
||
ν
|
||
|
||
a
|
||
|
||
|
||
|
||
|
||
{\displaystyle \nu _{a}}
|
||
|
||
are bounded then it exists a constant
|
||
|
||
|
||
|
||
C
|
||
>
|
||
0
|
||
|
||
|
||
{\displaystyle C>0}
|
||
|
||
such that for
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle T}
|
||
|
||
large enough the regret of IMED is upper bounded by
|
||
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
T
|
||
|
||
|
||
≤
|
||
|
||
(
|
||
|
||
|
||
∑
|
||
|
||
a
|
||
:
|
||
|
||
μ
|
||
|
||
a
|
||
|
||
|
||
<
|
||
|
||
μ
|
||
|
||
∗
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Δ
|
||
|
||
a
|
||
|
||
|
||
|
||
|
||
|
||
|
||
K
|
||
|
||
|
||
|
||
inf
|
||
|
||
|
||
(
|
||
|
||
ν
|
||
|
||
a
|
||
|
||
|
||
,
|
||
|
||
μ
|
||
|
||
∗
|
||
|
||
|
||
)
|
||
|
||
|
||
|
||
|
||
)
|
||
|
||
ln
|
||
|
||
T
|
||
−
|
||
C
|
||
ln
|
||
|
||
ln
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-C\ln \ln T}
|
||
|
||
|
||
== Computation time ==
|
||
The algorithm only requiere to compute the
|
||
|
||
|
||
|
||
|
||
K
|
||
|
||
i
|
||
n
|
||
f
|
||
|
||
|
||
|
||
|
||
{\displaystyle K_{inf}}
|
||
|
||
for suboptimal arms who are pulled
|
||
|
||
|
||
|
||
O
|
||
(
|
||
ln
|
||
|
||
T
|
||
)
|
||
|
||
|
||
{\displaystyle O(\ln T)}
|
||
|
||
times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the
|
||
|
||
|
||
|
||
|
||
K
|
||
|
||
i
|
||
n
|
||
f
|
||
|
||
|
||
|
||
|
||
{\displaystyle K_{inf}}
|
||
|
||
in the first order .
|
||
|
||
== See also ==
|
||
Multi-armed bandit
|
||
Kullback–Leibler Upper Confidence Bound
|
||
Confidence interval
|
||
|
||
== References == |