kb/Best_arm_identification-0.md at 2fbd85a83fe048bfb2e2feab15b77d5a79c2afae

turtle89431 1a02be5a8e Scrape wikipedia-science: 16831 new, 4190 updated, 21574 total (kb-cron)

2026-05-05 07:38:32 -07:00

9.4 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Best arm identification	1/3	https://en.wikipedia.org/wiki/Best_arm_identification	reference	science, encyclopedia	2026-05-05T14:37:26.259834+00:00	kb-cron

Best arm identification (BAI) is a sequential one-player game where the player has to find the best action (arm) among a list of actions (arms) by collecting information in the most efficient way. It is a multi-armed bandit game as a player only gets information about an arm by playing it. The most common objective in multi-armed bandit games is to minimize the regret (i.e., play the best action as much as possible), but in BAI, the goal is to find the best arm as efficiently as possible. This problem naturally arises in scenarios such as adaptive clinical trials where the number of patients is limited and the quantification of the confidence in a treatment is important. It also arises in hyperparameter optimization where the goal is to find the optimal choice of hyperparameters for an algorithm with the smallest possible number of experiments, as it can be costly in terms of time, energy, or money.

== Stochastic multi-armed bandit == The stochastic multi-armed bandit (MAB) is a sequential game with one player and

    K
  

{\displaystyle K}

actions (arms). Each arm has an unknown probability distribution associated with it. At each turn, the player has to choose one action and receive an observation from the probability distribution associated with the arm. The more you play an arm, the more you get information on its probability distribution.

=== Best arm identification === In BAI the goal is to find the arm that has the probability distribution with the highest mean. BAI may be either fixed confidence or fixed horizon. In a fixed-confidence game, a confidence level

    δ
  

{\displaystyle \delta }

is fixed at the beginning of the game and the goal is to find the best arm with this confidence level in as few turns as possible. In a fixed horizon game, the number of turns

    T
  

{\displaystyle T}

is fixed, and the goal is to find the best arm with the highest possible confidence in

    T
  

{\displaystyle T}

turns.

=== Math formalisation === We have one player and

    K
  

{\displaystyle K}

actions (arms). Behind each arm

    k
    ∈
    {
    1
    ,
    …
    ,
    K
    }
  

{\displaystyle k\in \{1,\ldots ,K\}}

lies an unknown distribution

      ν
      
        k
      
    
  

{\displaystyle \nu _{k}}

with mean

      μ
      
        k
      
    
  

{\displaystyle \mu _{k}}

. Each distribution

      ν
      
        k
      
    
  

{\displaystyle \nu _{k}}

belongs to a known family

        D
      
    
  

{\displaystyle {\mathcal {D}}}

(such as the set of Gaussian distributions or Bernoulli distributions). At each time step

    t
  

{\displaystyle t}

, the player selects an arm

      a
      
        t
      
    
  

{\displaystyle a_{t}}

and observes an independent sample

      X
      
        t
      
    
    ∼
    
      ν
      
        
          a
          
            t
          
        
      
    
  

{\displaystyle X_{t}\sim \nu _{a_{t}}}

from the corresponding distribution. We will note

      μ
      
        ∗
      
    
    :=
    max
    
      μ
      
        a
      
    
  

{\displaystyle \mu ^{*}:=\max \mu _{a}}

the highest mean. An arm

    a
  

{\displaystyle a}

that satisfies

      μ
      
        a
      
    
    =
    
      μ
      
        ∗
      
    
  

{\displaystyle \mu _{a}=\mu ^{*}}

is called an optimal arm; otherwise it is called suboptimal arm. In best arm identification (BAI) the objective is to identify an optimal arm. Two main settings for BAI appear in the literature:

Fixed confidence: In this setting, one typically assumes that there exists a unique optimal arm. A confidence level

    δ
    ∈
    (
    0
    ,
    1
    )
  

{\displaystyle \delta \in (0,1)}

is specified at the beginning. The algorithm must stop at some finite stopping time

      τ
      
        δ
      
    
    <
    +
    ∞
  

{\displaystyle \tau _{\delta }<+\infty }

and return an arm

            a
            ^
          
        
      
      
        
          τ
          
            δ
          
        
      
    
  

{\displaystyle {\hat {a}}_{\tau _{\delta }}}

such that the probability of error is bounded:

      P
    
    (
    
      
        
          
            a
            ^
          
        
      
      
        
          τ
          
            δ
          
        
      
    
    ≠
    
      a
      
        ∗
      
    
    )
    ≤
    δ
  

{\displaystyle \mathbb {P} ({\hat {a}}_{\tau _{\delta }}\neq a^{*})\leq \delta }

. The objective is to minimize the expected sample complexity

      E
    
    [
    
      τ
      
        δ
      
    
    ]
  

{\displaystyle \mathbb {E} [\tau _{\delta }]}

. Such a setting appears, for example, when a constraint on the confidence is required (for example, if we require a confidence level of 95%, so

    δ
    =
    1
    −
    0.95
    =
    0.05
  

{\displaystyle \delta =1-0.95=0.05}

Fixed horizon: In this setting, the number of samples

    T
  

{\displaystyle T}

is fixed in advance. The goal is to design an algorithm that minimizes the probability of misidentifying the optimal arm:

      P
    
    (
    
      
        
          
            a
            ^
          
        
      
      
        T
      
    
    ≠
    
      a
      
        ∗
      
    
    )
  

{\displaystyle \mathbb {P} ({\hat {a}}_{T}\neq a^{*})}

. This setting appears when the number of experiments is limited (for drug tests, the number of patients can be fixed in advance).

=== Example of simple modelling === In the case where we have

    K
  

{\displaystyle K}

treatments and we want to be sure with a confidence level of 95% which treatment is the best to heal a specific disease. Each treatment heals or does not heal the disease with a probability

      μ
      
        k
      
    
  

{\displaystyle \mu _{k}}

, which means that each distribution is a Bernoulli distribution, so

        D
      
    
  

{\displaystyle {\mathcal {D}}}

is the set of Bernoulli distributions. We can use a BAI algorithm to minimize

      E
    
    [
    
      τ
      
        0.05
      
    
    ]
  

{\displaystyle \mathbb {E} [\tau _{0.05}]}

, the number of patients required to find the best treatment with probability 95%.

== Applications == Best arm identification naturally arises in several practical domains:

Adaptive clinical trials: The objective is to identify the most effective treatment based on sequentially collected patient data. Each treatment can be modeled as having an underlying distribution of outcomes. The goal is to identify the treatment with the highest expected outcome with high confidence (fixed confidence setting

    δ
  

{\displaystyle \delta }

) while minimizing the number of drug test patients (minimise

      E
    
    [
    
      τ
      
        δ
      
    
    ]
  

{\displaystyle \mathbb {E} [\tau _{\delta }]}

), as it costs to pay patients for this and we would like to use as little as possible less effective drugs. Hyperparameter tuning: Selecting the best configuration for machine learning models efficiently by treating each hyperparameter setting as an arm. The goal is to find the best hyperparameter with as few experiments possible as experiments are costly in time and in energy

== Fixed confidence level ==

9.4 KiB Raw Blame History Unescape Escape

9.4 KiB

Raw Blame History