Standard
An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. / Seldin, Yevgeny; Lugosi, Gábor.
Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. red. / Satyen Kale; Ohad Shamir . Proceedings of Machine Learning Research, 2017. s. 1743-1759 (Proceedings of Machine Learning Research, Bind 65).
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
Seldin, Y & Lugosi, G 2017,
An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. i S Kale & O Shamir (red),
Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. Proceedings of Machine Learning Research, Proceedings of Machine Learning Research, bind 65, s. 1743-1759 , The 30th Annual Conference on Learning Theory (COLT), Amsterdam, Holland,
07/07/2017. <
http://proceedings.mlr.press/v65/seldin17a/seldin17a.pdf>
APA
Seldin, Y., & Lugosi, G. (2017).
An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. I S. Kale, & O. Shamir (red.),
Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands (s. 1743-1759 ). Proceedings of Machine Learning Research. Proceedings of Machine Learning Research Bind 65
http://proceedings.mlr.press/v65/seldin17a/seldin17a.pdf
Vancouver
Seldin Y, Lugosi G. An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. I Kale S, Shamir O, red., Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. Proceedings of Machine Learning Research. 2017. s. 1743-1759 . (Proceedings of Machine Learning Research, Bind 65).
Author
Seldin, Yevgeny ; Lugosi, Gábor. / An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. red. / Satyen Kale ; Ohad Shamir . Proceedings of Machine Learning Research, 2017. s. 1743-1759 (Proceedings of Machine Learning Research, Bind 65).
Bibtex
@inproceedings{8693595e09fb4d788049ecb90ed2d020,
title = "An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits",
abstract = "We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.",
author = "Yevgeny Seldin and G{\'a}bor Lugosi",
year = "2017",
language = "English",
series = "Proceedings of Machine Learning Research",
pages = "1743--1759 ",
editor = "Satyen Kale and {Shamir }, Ohad",
booktitle = "Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands",
publisher = "Proceedings of Machine Learning Research",
note = "The 30th Annual Conference on Learning Theory (COLT), COLT ; Conference date: 07-07-2017 Through 10-07-2017",
url = "http://www.learningtheory.org/colt2017/",
}
RIS
TY - GEN
T1 - An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits
AU - Seldin, Yevgeny
AU - Lugosi, Gábor
N1 - Conference code: 30
PY - 2017
Y1 - 2017
N2 - We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
AB - We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
M3 - Article in proceedings
T3 - Proceedings of Machine Learning Research
SP - 1743
EP - 1759
BT - Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands
A2 - Kale, Satyen
A2 - Shamir , Ohad
PB - Proceedings of Machine Learning Research
T2 - The 30th Annual Conference on Learning Theory (COLT)
Y2 - 7 July 2017 through 10 July 2017
ER -