An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
OriginalsprogEngelsk
TitelProceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands
RedaktørerSatyen Kale, Ohad Shamir
ForlagProceedings of Machine Learning Research
Publikationsdato2017
Sider1743-1759
StatusUdgivet - 2017
BegivenhedThe 30th Annual Conference on Learning Theory (COLT) - Amsterdam, Holland
Varighed: 7 jul. 201710 jul. 2017
Konferencens nummer: 30
http://www.learningtheory.org/colt2017/

Konference

KonferenceThe 30th Annual Conference on Learning Theory (COLT)
Nummer30
LandHolland
ByAmsterdam
Periode07/07/201710/07/2017
Internetadresse
NavnProceedings of Machine Learning Research
Vol/bind65
ISSN1938-7228

ID: 197766233