cluster analysis - Data clustering algorithm -


what popular text clustering algorithm deals large dimensions , huge dataset , fast? getting confused after reading many papers , many approaches..now want know 1 used most, have starting point writing clustering application documents.

to deal curse of dimensionality can try determine blind sources (ie topics) generated dataset. use principal component analysis or factor analysis reduce dimensionality of feature set , compute useful indexes.

pca used in latent semantic indexing, since svd can demonstrated pca : )

remember can lose interpretation when obtain principal components of dataset or factors, maybe wanna go non-negative matrix factorization route. (and here punch! k-means particular nnmf!) in nnmf dataset can explained additive, non-negative components.


Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -