kb/data/en.wikipedia.org/wiki/Algorithm_IMED-1.md

---
title: "Algorithm IMED"
chunk: 2/2
source: "https://en.wikipedia.org/wiki/Algorithm_IMED"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T14:37:18.993972+00:00"
instance: "kb-cron"
---

=== Lai–Robbins lower bound ===
In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order
It states that for every consistent algorithm on the set


            P


        (
        [
        −
        ∞
        ,
        1
        ]
        )


    {\displaystyle {\mathcal {P}}([-\infty ,1])}

 — that is, an algorithm for which, for every


        (

          ν

            1


        ,
        …
        ,

          ν

            K


        )
        ∈


            P


        (
        [
        −
        ∞
        ,
        1
        ]

          )

            K


    {\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}}

, the regret


          R

            T


    {\displaystyle R_{T}}

 is subpolynomial (i.e.


          R

            T


        =

          o

            T
            →
            +
            ∞


        (

          T

            α


        )


    {\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })}

 for all


        α
        >
        0


    {\displaystyle \alpha >0}

) — we have:


          R

            T


        ≥

          (


              ∑

                a
                :

                  μ

                    a


                <

                  μ

                    ∗


                  Δ

                    a


                        K


                      inf


                  (

                    ν

                      a


                  ,

                    μ

                      ∗


                  )


          )

        ln
        ⁡
        T
        −

          Ω

            T
            →
            +
            ∞


        (
        ln
        ⁡
        ln
        ⁡
        T
        )
        .


    {\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).}


This bound is asymptotic (as


        T
        →
        +
        ∞


    {\displaystyle T\to +\infty }

) and gives a first-order lower bound of order


        ln
        ⁡
        T


    {\displaystyle \ln T}

 with the optimal constant in front of it and the second order in


        −
        Ω
        (
        ln
        ⁡
        ln
        ⁡
        T
        )


    {\displaystyle -\Omega (\ln \ln T)}

.

=== Regret bound for IMED ===
If the distribution of every arm


        a


    {\displaystyle a}

 is


        (
        −
        ∞
        ,
        1
        ]


    {\displaystyle (-\infty ,1]}

 ( i.e.


          ν

            a


        ∈


            P


        (
        [
        −
        ∞
        ,
        1
        ]
        )
        )


    {\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))}

 then the regret of the algorithm IMED verify


          R

            T


        ≤

          (


              ∑

                a
                :

                  μ

                    a


                <

                  μ

                    ∗


                  Δ

                    a


                        K


                      inf


                  (

                    ν

                      a


                  ,

                    μ

                      ∗


                  )


          )

        ln
        ⁡
        T
        +
        O
        (
        1
        )


    {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T+O(1)}


If all the distribution


          ν

            a


    {\displaystyle \nu _{a}}

 are bounded then it exists a constant


        C
        >
        0


    {\displaystyle C>0}

 such that for


        T


    {\displaystyle T}

 large enough the regret of IMED is upper bounded by


          R

            T


        ≤

          (


              ∑

                a
                :

                  μ

                    a


                <

                  μ

                    ∗


                  Δ

                    a


                        K


                      inf


                  (

                    ν

                      a


                  ,

                    μ

                      ∗


                  )


          )

        ln
        ⁡
        T
        −
        C
        ln
        ⁡
        ln
        ⁡
        T


    {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-C\ln \ln T}


== Computation time ==
The algorithm only requiere to compute the


          K

            i
            n
            f


    {\displaystyle K_{inf}}

 for suboptimal arms who are pulled


        O
        (
        ln
        ⁡
        T
        )


    {\displaystyle O(\ln T)}

 times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the


          K

            i
            n
            f


    {\displaystyle K_{inf}}

 in the first order .

== See also ==
Multi-armed bandit
Kullback–Leibler Upper Confidence Bound
Confidence interval

== References ==