On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

Antoine Barrier; Aurélien Garivier; Gilles Stoltz

Pré-Publication, Document De Travail Année : 2022

On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

(1, 2, 3) , (1, 4) , (2, 3)

1
2
3
4

Antoine Barrier

Fonction : Auteur
PersonId : 1339503
IdHAL : antoine-barrier
ORCID : 0000-0001-5224-0581

Unité de Mathématiques Pures et Appliquées

Laboratoire de Mathématiques d'Orsay

Statistique mathématique et apprentissage

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Unité de Mathématiques Pures et Appliquées

Laboratoire de l'Informatique du Parallélisme

Gilles Stoltz

Fonction : Auteur
PersonId : 738739
IdHAL : gilles-stoltz
ORCID : 0000-0003-1240-1007
IdRef : 091575419

Laboratoire de Mathématiques d'Orsay

Statistique mathématique et apprentissage

Résumé

We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on information-theoretic quantities that correspond to infima over Kullback-Leibler divergences between some distributions in D and a given distribution. This is made possible by a refined analysis of the successive-rejects strategy of Audibert, Bubeck, and Munos (2010). We finally provide lower bounds on the same average log-probability, also in terms of the same new information-theoretic quantities; these lower bounds are larger when the (natural) assumptions on the considered strategies are stronger. All these new upper and lower bounds generalize existing bounds based, e.g., on gaps between distributions.

Mots clés

Multi-armed bandits Best-arm identification Non-parametric models Kullback-Leibler divergences Information-theoretic bounds

Domaines

Autres [stat.ML] Statistiques [math.ST] Théorie de l'information [cs.IT] Apprentissage [cs.LG]

Fichier principal

BaGaSt--BAI-fixed-T-v3.pdf (502.16 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Gilles Stoltz : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03792668

Soumis le : vendredi 30 septembre 2022-12:28:57

Dernière modification le : mardi 12 mars 2024-07:10:03

Dates et versions

hal-03792668 , version 1 (30-09-2022)

hal-03792668 , version 2 (31-01-2023)

Identifiants

HAL Id : hal-03792668 , version 1
ARXIV : 2210.00895

Citer

Antoine Barrier, Aurélien Garivier, Gilles Stoltz. On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits. 2022. ⟨hal-03792668v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

209 Consultations

133 Téléchargements

On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager