Страница публикации

Near-optimal large-scale k-medoids clustering

Авторы: Ushakov A.V., Vasilyev I.

Журнал: Information Sciences

Том: 545


Год: 2021

Отчётный год: 2020


Местоположение издательства:


Аннотация: The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for huge-scale datasets remains extremely challenging. The application of many approaches finding optimal or quality solutions is limited to only small and medium-size instances. On the other hand, many parallel, distributed algorithms that can handle huge-scale datasets usually provide very poor solutions. In this paper, we develop a first parallel, distributed primal–dual heuristic algorithm for the k-medoids problem. Its main component is a very efficient parallel subgradient column generation that solves a Lagrangian dual problem and finds a tight bound on solution quality. High-quality solutions are then produced by a parallel core selection technique. We considerably reduce computational burden and memory load by employing a nearest neighbor strategy to approximate the dissimilarity matrix. We demonstrate that our algorithm finds very close to optimal solutions, confirmed by the tightness of dual bounds, of instances that are much larger than those considered in the literature to date. Our experiments include clustering large-scale collections of face images into several thousand of clusters. We show that our approach outperforms parallel improved versions of the most popular k-medoids clustering algorithms, achieving nearly linear parallel speedup.

Индексируется WOS: 1

Индексируется Scopus: 1

Индексируется РИНЦ: 0

Публикация в печати: 0

Добавил в систему: