kb/Algorithm_IMED-1.md at 44111d79dba90ad0d51bf58d5a753c19d087b417

turtle89431 1a02be5a8e Scrape wikipedia-science: 16831 new, 4190 updated, 21574 total (kb-cron)

2026-05-05 07:38:32 -07:00

8.8 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Algorithm IMED	2/2	https://en.wikipedia.org/wiki/Algorithm_IMED	reference	science, encyclopedia	2026-05-05T14:37:18.993972+00:00	kb-cron

=== Lai–Robbins lower bound === In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order It states that for every consistent algorithm on the set

        P
      
    
    (
    [
    −
    ∞
    ,
    1
    ]
    )
  

{\displaystyle {\mathcal {P}}([-\infty ,1])}

— that is, an algorithm for which, for every

    (
    
      ν
      
        1
      
    
    ,
    …
    ,
    
      ν
      
        K
      
    
    )
    ∈
    
      
        P
      
    
    (
    [
    −
    ∞
    ,
    1
    ]
    
      )
      
        K
      
    
  

{\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}}

, the regret

      R
      
        T
      
    
  

{\displaystyle R_{T}}

is subpolynomial (i.e.

      R
      
        T
      
    
    =
    
      o
      
        T
        →
        +
        ∞
      
    
    (
    
      T
      
        α
      
    
    )
  

{\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })}

for all

    α
    >
    0
  

{\displaystyle \alpha >0}

) — we have:

      R
      
        T
      
    
    ≥
    
      (
      
        
          ∑
          
            a
            :
            
              μ
              
                a
              
            
            <
            
              μ
              
                ∗
              
            
          
        
        
          
            
              Δ
              
                a
              
            
            
              
                
                  
                    K
                  
                
                
                  inf
                
              
              (
              
                ν
                
                  a
                
              
              ,
              
                μ
                
                  ∗
                
              
              )
            
          
        
      
      )
    
    ln
    ⁡
    T
    −
    
      Ω
      
        T
        →
        +
        ∞
      
    
    (
    ln
    ⁡
    ln
    ⁡
    T
    )
    .
  

{\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).}

This bound is asymptotic (as

    T
    →
    +
    ∞
  

{\displaystyle T\to +\infty }

) and gives a first-order lower bound of order

    ln
    ⁡
    T
  

{\displaystyle \ln T}

with the optimal constant in front of it and the second order in

    −
    Ω
    (
    ln
    ⁡
    ln
    ⁡
    T
    )
  

{\displaystyle -\Omega (\ln \ln T)}

=== Regret bound for IMED === If the distribution of every arm

    a
  

{\displaystyle a}

    (
    −
    ∞
    ,
    1
    ]
  

{\displaystyle (-\infty ,1]}

( i.e.

      ν
      
        a
      
    
    ∈
    
      
        P
      
    
    (
    [
    −
    ∞
    ,
    1
    ]
    )
    )
  

{\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))}

then the regret of the algorithm IMED verify

      R
      
        T
      
    
    ≤
    
      (
      
        
          ∑
          
            a
            :
            
              μ
              
                a
              
            
            <
            
              μ
              
                ∗
              
            
          
        
        
          
            
              Δ
              
                a
              
            
            
              
                
                  
                    K
                  
                
                
                  inf
                
              
              (
              
                ν
                
                  a
                
              
              ,
              
                μ
                
                  ∗
                
              
              )
            
          
        
      
      )
    
    ln
    ⁡
    T
    +
    O
    (
    1
    )
  

{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T+O(1)}

If all the distribution

      ν
      
        a
      
    
  

{\displaystyle \nu _{a}}

are bounded then it exists a constant

    C
    >
    0
  

{\displaystyle C>0}

such that for

    T
  

{\displaystyle T}

large enough the regret of IMED is upper bounded by

      R
      
        T
      
    
    ≤
    
      (
      
        
          ∑
          
            a
            :
            
              μ
              
                a
              
            
            <
            
              μ
              
                ∗
              
            
          
        
        
          
            
              Δ
              
                a
              
            
            
              
                
                  
                    K
                  
                
                
                  inf
                
              
              (
              
                ν
                
                  a
                
              
              ,
              
                μ
                
                  ∗
                
              
              )
            
          
        
      
      )
    
    ln
    ⁡
    T
    −
    C
    ln
    ⁡
    ln
    ⁡
    T
  

{\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{*}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{*})}}\right)\ln T-C\ln \ln T}

== Computation time == The algorithm only requiere to compute the

      K
      
        i
        n
        f
      
    
  

{\displaystyle K_{inf}}

for suboptimal arms who are pulled

    O
    (
    ln
    ⁡
    T
    )
  

{\displaystyle O(\ln T)}

times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the

      K
      
        i
        n
        f
      
    
  

{\displaystyle K_{inf}}

in the first order .

== See also == Multi-armed bandit Kullback–Leibler Upper Confidence Bound Confidence interval

== References ==

8.8 KiB Raw Blame History Unescape Escape

8.8 KiB

Raw Blame History