cluster analysis - Data clustering algorithm -

July 15, 2013

what popular text clustering algorithm deals large dimensions , huge dataset , fast? getting confused after reading many papers , many approaches..now want know 1 used most, have starting point writing clustering application documents.

to deal curse of dimensionality can try determine blind sources (ie topics) generated dataset. use principal component analysis or factor analysis reduce dimensionality of feature set , compute useful indexes.

pca used in latent semantic indexing, since svd can demonstrated pca : )

remember can lose interpretation when obtain principal components of dataset or factors, maybe wanna go non-negative matrix factorization route. (and here punch! k-means particular nnmf!) in nnmf dataset can explained additive, non-negative components.

Search This Blog

shell

cluster analysis - Data clustering algorithm -

Comments

Post a Comment

Popular posts from this blog

Add email recipient to all new Trac tickets -

400 Bad Request on Apache/PHP AddHandler wrapper -

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -