Factored Bandits

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Julian Ulf Zimmert
Seldin, Yevgeny

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

Originalsprog	Engelsk
Titel	Proceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
Antal sider	10
Forlag	NIPS Proceedings
Publikationsdato	2018
Status	Udgivet - 2018
Begivenhed	32nd Annual Conference on Neural Information Processing Systems - Montreal, Montreal, Canada Varighed: 2 dec. 2018 → 8 dec. 2018 Konferencens nummer: 32 https://nips.cc/Conferences/2018

Konference

Konference	32nd Annual Conference on Neural Information Processing Systems
Nummer	32
Lokation	Montreal
Land	Canada
By	Montreal
Periode	02/12/2018 → 08/12/2018
Internetadresse	https://nips.cc/Conferences/2018

Navn	Advances in Neural Information Processing Systems
Vol/bind	31
ISSN	1049-5258

Forskning

Factored Bandits

Konference

Links