7.8 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Gittins index | 3/3 | https://en.wikipedia.org/wiki/Gittins_index | reference | science, encyclopedia | 2026-05-05T09:50:20.644914+00:00 | kb-cron |
=== Generalized index === If the probability of survival
β
(
i
)
{\displaystyle \beta (i)}
depends on the state
i
{\displaystyle i}
, a generalization introduced by Sonin (2008) defines the Gittins index
α
(
i
)
{\displaystyle \alpha (i)}
as the maximum discounted total reward per chance of termination.
α
(
i
)
=
sup
τ
>
0
R
τ
(
i
)
Q
τ
(
i
)
{\displaystyle \alpha (i)=\sup _{\tau >0}{\frac {R^{\tau }(i)}{Q^{\tau }(i)}}}
where
R
τ
(
i
)
=
⟨
∑
t
=
0
τ
−
1
R
[
Z
(
t
)
]
⟩
Z
(
0
)
=
i
{\displaystyle R^{\tau }(i)=\left\langle \sum _{t=0}^{\tau -1}R[Z(t)]\right\rangle _{Z(0)=i}}
Q
τ
(
i
)
=
⟨
1
−
∏
t
=
0
τ
−
1
β
[
Z
(
t
)
]
⟩
Z
(
0
)
=
i
{\displaystyle Q^{\tau }(i)=\left\langle 1-\prod _{t=0}^{\tau -1}\beta [Z(t)]\right\rangle _{Z(0)=i}}
If
β
t
{\displaystyle \beta ^{t}}
is replaced by
∏
j
=
0
t
−
1
β
[
Z
(
j
)
]
{\displaystyle \prod _{j=0}^{t-1}\beta [Z(j)]}
in the definitions of
ν
(
i
)
{\displaystyle \nu (i)}
,
w
(
i
)
{\displaystyle w(i)}
and
h
(
i
)
{\displaystyle h(i)}
, then it holds that
α
(
i
)
=
h
(
i
)
=
w
(
i
)
{\displaystyle \alpha (i)=h(i)=w(i)}
α
(
i
)
≠
k
ν
(
i
)
,
∀
k
{\displaystyle \alpha (i)\neq k\nu (i),\forall k}
this observation leads Sonin to conclude that
α
(
i
)
{\displaystyle \alpha (i)}
and not
ν
(
i
)
{\displaystyle \nu (i)}
is the "true meaning" of the Gittins index.
== Queueing theory == In queueing theory, Gittins index is used to determine the optimal scheduling of jobs, e.g., in an M/G/1 queue. The mean completion time of jobs under a Gittins index schedule can be determined using the SOAP approach. Note that the dynamics of the queue are intrinsically Markovian, and stochasticity is due to the arrival and service processes. This is in contrast to most of the works in the learning literature, where stochasticity is explicitly accounted through a noise term.
== Fractional problems == While conventional Gittins indices induce a policy to optimize the accrual of a reward, a common problem setting consists of optimizing the ratio of accrued rewards. For example, this is a case for systems to maximize bandwidth, consisting of data over time, or minimize power consumption, consisting of energy over time. This class of problems is different from the optimization of a semi-Markov reward process, because the latter one might select states with a disproportionate sojourn time just for accruing a higher reward. Instead, it corresponds to the class of linear-fractional markov reward optimization problem. However, a detrimental aspect of such ratio optimizations is that, once the achieved ratio in some state is high, the optimization might select states leading to a low ratio because they bear a high probability of termination, so that the process is likely to terminate before the ratio drops significantly. A problem setting to prevent such early terminations consists of defining the optimization as maximization of the future ratio seen by each state. An indexation is conjectured to exist for this problem, be computable as simple variation on existing restart-in-state or state elimination algorithms and evaluated to work well in practice.
== Notes ==
== References == Scully, Ziv and Harchol-Balter, Mor and Scheller-Wolf, Alan (2018). "SOAP: One Clean Analysis of All Age-Based Scheduling Policies". Proceedings of the ACM on Measurement and Analysis of Computing Systems. 2 (1). ACM: 16. doi:10.1145/3179419. S2CID 216145213.{{cite journal}}: CS1 maint: multiple names: authors list (link) Berry, Donald A. and Fristedt, Bert (1985). Bandit problems: Sequential allocation of experiments. Monographs on Statistics and Applied Probability. London: Chapman & Hall. ISBN 978-0-412-24810-8.{{cite book}}: CS1 maint: multiple names: authors list (link) Gittins, J.C. (1989). Multi-armed bandit allocation indices. Wiley-Interscience Series in Systems and Optimization. foreword by Peter Whittle. Chichester: John Wiley & Sons, Ltd. ISBN 978-0-471-92059-5. Weber, R.R. (November 1992). "On the Gittins index for multiarmed bandits". The Annals of Applied Probability. 2 (4): 1024–1033. doi:10.1214/aoap/1177005588. JSTOR 2959678. Katehakis, Michael N.; Veinott, Arthur F. Jr. (1987). "The multi-armed bandit problem: decomposition and computation". Mathematics of Operations Research. 12 (2): 262–268. doi:10.1287/moor.12.2.262. JSTOR 3689689. S2CID 656323. Cowan, W. and M.N. Katehakis (2014). "Multi-armed Bandits under General Depreciation and Commitment". Probability in the Engineering and Informational Sciences. 29: 51–76. doi:10.1017/S0269964814000217.
== External links == [1] Matlab/Octave implementation of the index computation algorithms Cowan, Robin (1991). "Tortoises and Hares: Choice Among Technologies of Unknown Merit". The Economic Journal. 101 (407): 801–814. doi:10.2307/2233856. JSTOR 2233856.