An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Seldin, Yevgeny
Gábor Lugosi

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

Originalsprog	Engelsk
Titel	Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands
Redaktører	Satyen Kale, Ohad Shamir
Forlag	Proceedings of Machine Learning Research
Publikationsdato	2017
Sider	1743-1759
Status	Udgivet - 2017
Begivenhed	The 30th Annual Conference on Learning Theory (COLT) - Amsterdam, Holland Varighed: 7 jul. 2017 → 10 jul. 2017 Konferencens nummer: 30 http://www.learningtheory.org/colt2017/

Konference

Konference	The 30th Annual Conference on Learning Theory (COLT)
Nummer	30
Land	Holland
By	Amsterdam
Periode	07/07/2017 → 10/07/2017
Internetadresse	http://www.learningtheory.org/colt2017/

Navn	Proceedings of Machine Learning Research
Vol/bind	65
ISSN	1938-7228

Forskning

An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

Konference

Links