Valerio-Martínez, Raúl O.
Loading...
1 results
Publication Search Results
Now showing 1 - 1 of 1
Publication Performance of a novel algorithm for clustering (hdbscan) applied to covariance matrices(2018) Valerio-Martínez, Raúl O.; Torres-Saavedra, Pedro A.; College of Arts and Sciences - Sciences; Santana-Morant, Dámaris; Rolke, Wolfang; Department of Mathematics; Santivañez, JoséThe use of grouping algorithms to divide and classify information in some phenomena is much easier in these modern times due to the advances in computing. However, measuring the efficiency and performance of these algorithms is necessary to identify the best methods to achieve the clustering. There exist several clustering or classification algorithms in literature such as KNN and k-means. More recently, the HDBSCAN, a density-based spatial hierarchical clustering algorithm, has been proposed. This algorithm has the ability to detect arbitrarily shaped clusters and it requires the specification of fewer parameters for achieving the best possible classification when compared to its competitors. Simulation studies have shown that this algorithm outperforms its competitors when clustering objects with several features. Nonetheless, the HDBSCAN algorithm has not been used for clustering of covariance matrices. Hence, this thesis proposes the use of the HDBSCAN algorithm for clustering covariance matrices, a task that could have applications in different areas such as time series, image processing, among others. A comparison of the performance of HDBSCAN with DBSCAN, K-NN and k-means is done using simulation studies. The scenarios of the simulation studies focus mainly on the sample size (number of matrices), number of clusters, size of the matrices, and distance metric. The relevance of this study is that, to our best knowledge, the HDBSCAN has not been implemented for clustering of covariance matrices. One of the factors having a large influence in the performance of a clustering algorithm is the distance metric. In this work, a revision of distance metrics between matrices is given. In particular, this thesis considers an affine invariant transformation (AIRM) to the calculate distance between symmetric positive definite matrices (SPD). This metric is compared with some popular distance metrics for matrices. Simulation studies suggest that the combination of distance metric AIRM and HDBSCAN exhibit the higher computational cost for large arrays of matrices. Nonetheless, this combination is effective for clustering high-dimensional matrices. K-means and HDBSCAN have comparable results for small number of clusters and high-dimensional covariance matrices. However, when the input parameters of the algorithms change, purity values for HDBSCAN do not change considerably (i.e., HDBSCAN is more robust to changes of input parameters). K-means algorithm suffers when the input parameters are manipulated, a sensitive issue when dealing with real problems. These findings demonstrate that HDBSCAN offers the highest robustness and performance for the four analyzed algorithms, a result that is consistent with previous finding for vectors.