cluster analysis - Data clustering algorithm -
what popular text clustering algorithm deals large dimensions , huge dataset , fast? getting confused after reading many papers , many approaches..now want know 1 used most, have starting point writing clustering application documents.
to deal curse of dimensionality can try determine blind sources
(ie topics) generated dataset. use principal component analysis or factor analysis reduce dimensionality of feature set , compute useful indexes.
pca used in latent semantic indexing, since svd can demonstrated pca : )
remember can lose interpretation when obtain principal components of dataset or factors, maybe wanna go non-negative matrix factorization route. (and here punch! k-means particular nnmf!) in nnmf dataset can explained additive, non-negative components.
Comments
Post a Comment