An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. / Seldin, Yevgeny; Lugosi, Gábor.

Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. red. / Satyen Kale; Ohad Shamir . Proceedings of Machine Learning Research, 2017. s. 1743-1759 (Proceedings of Machine Learning Research, Bind 65).

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Seldin, Y & Lugosi, G 2017, An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. i S Kale & O Shamir (red), Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. Proceedings of Machine Learning Research, Proceedings of Machine Learning Research, bind 65, s. 1743-1759 , The 30th Annual Conference on Learning Theory (COLT), Amsterdam, Holland, 07/07/2017. <http://proceedings.mlr.press/v65/seldin17a/seldin17a.pdf>

APA

Seldin, Y., & Lugosi, G. (2017). An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. I S. Kale, & O. Shamir (red.), Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands (s. 1743-1759 ). Proceedings of Machine Learning Research. Proceedings of Machine Learning Research Bind 65 http://proceedings.mlr.press/v65/seldin17a/seldin17a.pdf

Vancouver

Seldin Y, Lugosi G. An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. I Kale S, Shamir O, red., Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. Proceedings of Machine Learning Research. 2017. s. 1743-1759 . (Proceedings of Machine Learning Research, Bind 65).

Author

Seldin, Yevgeny ; Lugosi, Gábor. / An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits. Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands. red. / Satyen Kale ; Ohad Shamir . Proceedings of Machine Learning Research, 2017. s. 1743-1759 (Proceedings of Machine Learning Research, Bind 65).

Bibtex

@inproceedings{8693595e09fb4d788049ecb90ed2d020,

title = "An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits",

abstract = "We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.",

author = "Yevgeny Seldin and G{\'a}bor Lugosi",

year = "2017",

language = "English",

series = "Proceedings of Machine Learning Research",

pages = "1743--1759 ",

editor = "Satyen Kale and {Shamir }, Ohad",

booktitle = "Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands",

publisher = "Proceedings of Machine Learning Research",

note = "The 30th Annual Conference on Learning Theory (COLT), COLT ; Conference date: 07-07-2017 Through 10-07-2017",

url = "http://www.learningtheory.org/colt2017/",

}

RIS

TY - GEN

T1 - An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

AU - Seldin, Yevgeny

AU - Lugosi, Gábor

N1 - Conference code: 30

PY - 2017

Y1 - 2017

N2 - We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

AB - We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

M3 - Article in proceedings

T3 - Proceedings of Machine Learning Research

SP - 1743

EP - 1759

BT - Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands

A2 - Kale, Satyen

A2 - Shamir , Ohad

PB - Proceedings of Machine Learning Research

T2 - The 30th Annual Conference on Learning Theory (COLT)

Y2 - 7 July 2017 through 10 July 2017

ER -

ID: 197766233