A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Fulltext
Indsendt manuskript, 336 KB, PDF-dokument

Chloé Rouyer
Dirk van der Hoeven
Nicolò Cesa-Bianchi
Seldin, Yevgeny

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O~(αT−−−√), where T is the time horizon and α is the independence number of the feedback graph. The bound against stochastic environments is O((lnT)2maxS∈I(G)∑i∈SΔ−1i) where I(G) is the family of all independent sets in a suitably defined undirected version of the graph and Δi are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.

Originalsprog	Engelsk
Titel	Advances in Neural Information Processing Systems 35 (NeurIPS 2022)
Forlag	NeurIPS Proceedings
Publikationsdato	2022
Sider	35035-35048
ISBN (Elektronisk)	9781713871088
Status	Udgivet - 2022
Begivenhed	36th Conference on Neural Information Processing Systems (NeurIPS 2022). - New Orleans/ Virtual, USA Varighed: 28 nov. 2022 → 9 dec. 2022

Konference

Konference	36th Conference on Neural Information Processing Systems (NeurIPS 2022).
Land	USA
By	New Orleans/ Virtual
Periode	28/11/2022 → 09/12/2022

Forskning