Active Learning for Semi-Supervised K-Means Clustering
Résumé
Means algorithm is one of the most used clustering algorithm for Knowledge Discovery in Data Mining. Seedbased K-Means is the integration of a small set of labeled data (called seeds) to the K-Means algorithm to improve its performances and overcome its sensitivity to initial centers, that are, most of the time, generated at random or the authors assume that the seeds are available for each cluster. This paper introduces a new efficient algorithm for active seeds selection which relies on a Min-Max approach that favors the coverage of the whole dataset. Experiments conducted on artificial and real datasets show that, using our active seeds selection algorithm, our algorithm can collect the seeds such that, for each data set, each cluster has at least one seed after a very small number of queries, and using the collected seeds, the number of convergence iteration of K-Means clustering will be reduced, which is crucial in many KDD applications.
Domaines
Intelligence artificielle [cs.AI]
Fichier principal
Active_Learning_for_Semi_Supervised Active learning for semi-supervised k-means clustering- Vu-Labroche.pdf (175.59 Ko)
Télécharger le fichier
Origine | Fichiers produits par l'(les) auteur(s) |
---|