kb/data/en.wikipedia.org/wiki/Algorithm_IMED-1.md

597 lines
8.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Algorithm IMED"
chunk: 2/2
source: "https://en.wikipedia.org/wiki/Algorithm_IMED"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T14:37:18.993972+00:00"
instance: "kb-cron"
---
=== LaiRobbins lower bound ===
In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order
It states that for every consistent algorithm on the set
P
(
[
,
1
]
)
{\displaystyle {\mathcal {P}}([-\infty ,1])}
— that is, an algorithm for which, for every
(
ν
1
,
,
ν
K
)
P
(
[
,
1
]
)
K
{\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}}
, the regret
R
T
{\displaystyle R_{T}}
is subpolynomial (i.e.
R
T
=
o
T
+
(
T
α
)
{\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })}
for all
α
>
0
{\displaystyle \alpha >0}
) — we have:
R
T
(
a
:
μ
a
<
μ
Δ
a
K
inf
(
ν
a
,
μ
)
)
ln
T
Ω
T
+
(
ln
ln
T
)
.
{\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).}
This bound is asymptotic (as
T
+
{\displaystyle T\to +\infty }
) and gives a first-order lower bound of order
ln
T
{\displaystyle \ln T}
with the optimal constant in front of it and the second order in
Ω
(
ln
ln
T
)
{\displaystyle -\Omega (\ln \ln T)}
.
=== Regret bound for IMED ===
If the distribution of every arm
a
{\displaystyle a}
is
(
,
1
]
{\displaystyle (-\infty ,1]}
( i.e.
ν
a
P
(
[
,
1
]
)
)
{\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))}
then the regret of the algorithm IMED verify
R
T
(
a
:
μ
a
<
μ
Δ
a
K
inf
(
ν
a
,
μ
)
)
ln
T
+
O
(
1
)
{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T+O(1)}
If all the distribution
ν
a
{\displaystyle \nu _{a}}
are bounded then it exists a constant
C
>
0
{\displaystyle C>0}
such that for
T
{\displaystyle T}
large enough the regret of IMED is upper bounded by
R
T
(
a
:
μ
a
<
μ
Δ
a
K
inf
(
ν
a
,
μ
)
)
ln
T
C
ln
ln
T
{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-C\ln \ln T}
== Computation time ==
The algorithm only requiere to compute the
K
i
n
f
{\displaystyle K_{inf}}
for suboptimal arms who are pulled
O
(
ln
T
)
{\displaystyle O(\ln T)}
times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the
K
i
n
f
{\displaystyle K_{inf}}
in the first order .
== See also ==
Multi-armed bandit
KullbackLeibler Upper Confidence Bound
Confidence interval
== References ==