---
title: "Algorithm IMED"
chunk: 2/2
source: "https://en.wikipedia.org/wiki/Algorithm_IMED"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T14:37:18.993972+00:00"
instance: "kb-cron"
---

=== Lai–Robbins lower bound ===
In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order 
It states that for every consistent algorithm on the set 
  
    
      
        
          
            P
          
        
        (
        [
        −
        ∞
        ,
        1
        ]
        )
      
    
    {\displaystyle {\mathcal {P}}([-\infty ,1])}
  
 — that is, an algorithm for which, for every 
  
    
      
        (
        
          ν
          
            1
          
        
        ,
        …
        ,
        
          ν
          
            K
          
        
        )
        ∈
        
          
            P
          
        
        (
        [
        −
        ∞
        ,
        1
        ]
        
          )
          
            K
          
        
      
    
    {\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}}
  
, the regret 
  
    
      
        
          R
          
            T
          
        
      
    
    {\displaystyle R_{T}}
  
 is subpolynomial (i.e. 
  
    
      
        
          R
          
            T
          
        
        =
        
          o
          
            T
            →
            +
            ∞
          
        
        (
        
          T
          
            α
          
        
        )
      
    
    {\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })}
  
 for all 
  
    
      
        α
        >
        0
      
    
    {\displaystyle \alpha >0}
  
) — we have:

  
    
      
        
          R
          
            T
          
        
        ≥
        
          (
          
            
              ∑
              
                a
                :
                
                  μ
                  
                    a
                  
                
                <
                
                  μ
                  
                    ∗
                  
                
              
            
            
              
                
                  Δ
                  
                    a
                  
                
                
                  
                    
                      
                        K
                      
                    
                    
                      inf
                    
                  
                  (
                  
                    ν
                    
                      a
                    
                  
                  ,
                  
                    μ
                    
                      ∗
                    
                  
                  )
                
              
            
          
          )
        
        ln
        ⁡
        T
        −
        
          Ω
          
            T
            →
            +
            ∞
          
        
        (
        ln
        ⁡
        ln
        ⁡
        T
        )
        .
      
    
    {\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).}
  

This bound is asymptotic (as 
  
    
      
        T
        →
        +
        ∞
      
    
    {\displaystyle T\to +\infty }
  
) and gives a first-order lower bound of order 
  
    
      
        ln
        ⁡
        T
      
    
    {\displaystyle \ln T}
  
 with the optimal constant in front of it and the second order in 
  
    
      
        −
        Ω
        (
        ln
        ⁡
        ln
        ⁡
        T
        )
      
    
    {\displaystyle -\Omega (\ln \ln T)}
  
.

=== Regret bound for IMED ===
If the distribution of every arm 
  
    
      
        a
      
    
    {\displaystyle a}
  
 is 
  
    
      
        (
        −
        ∞
        ,
        1
        ]
      
    
    {\displaystyle (-\infty ,1]}
  
 ( i.e. 
  
    
      
        
          ν
          
            a
          
        
        ∈
        
          
            P
          
        
        (
        [
        −
        ∞
        ,
        1
        ]
        )
        )
      
    
    {\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))}
  
 then the regret of the algorithm IMED verify

  
    
      
        
          R
          
            T
          
        
        ≤
        
          (
          
            
              ∑
              
                a
                :
                
                  μ
                  
                    a
                  
                
                <
                
                  μ
                  
                    ∗
                  
                
              
            
            
              
                
                  Δ
                  
                    a
                  
                
                
                  
                    
                      
                        K
                      
                    
                    
                      inf
                    
                  
                  (
                  
                    ν
                    
                      a
                    
                  
                  ,
                  
                    μ
                    
                      ∗
                    
                  
                  )
                
              
            
          
          )
        
        ln
        ⁡
        T
        +
        O
        (
        1
        )
      
    
    {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T+O(1)}
  

If all the distribution 
  
    
      
        
          ν
          
            a
          
        
      
    
    {\displaystyle \nu _{a}}
  
 are bounded then it exists a constant 
  
    
      
        C
        >
        0
      
    
    {\displaystyle C>0}
  
 such that for 
  
    
      
        T
      
    
    {\displaystyle T}
  
 large enough the regret of IMED is upper bounded by

  
    
      
        
          R
          
            T
          
        
        ≤
        
          (
          
            
              ∑
              
                a
                :
                
                  μ
                  
                    a
                  
                
                <
                
                  μ
                  
                    ∗
                  
                
              
            
            
              
                
                  Δ
                  
                    a
                  
                
                
                  
                    
                      
                        K
                      
                    
                    
                      inf
                    
                  
                  (
                  
                    ν
                    
                      a
                    
                  
                  ,
                  
                    μ
                    
                      ∗
                    
                  
                  )
                
              
            
          
          )
        
        ln
        ⁡
        T
        −
        C
        ln
        ⁡
        ln
        ⁡
        T
      
    
    {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-C\ln \ln T}
  

== Computation time ==
The algorithm only requiere to compute the 
  
    
      
        
          K
          
            i
            n
            f
          
        
      
    
    {\displaystyle K_{inf}}
  
 for suboptimal arms who are pulled 
  
    
      
        O
        (
        ln
        ⁡
        T
        )
      
    
    {\displaystyle O(\ln T)}
  
 times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the 
  
    
      
        
          K
          
            i
            n
            f
          
        
      
    
    {\displaystyle K_{inf}}
  
 in the first order .

== See also ==
Multi-armed bandit
Kullback–Leibler Upper Confidence Bound
Confidence interval

== References ==