Matrix analytic methods Computer Science Data Mining
The clustering problem has been widely studied in data mining and machine learning. It has numerous applications to pattern recognition, information retrieval, image analysis and bioinformatics, etc. In general, clustering is a fundamental unsupervised machine learning technique that aims to partition the data set based on their similarity. Recently there has been significant development in the use of non-negative matrix factorization (NMF) methods for various clustering tasks. The method finds two non-negative matrix whose product approximates the original matrix. The non-negativity of the factored matrices is superior to other matrix factorization methods because it makes the data interpretation much easier. Moreover, NMF has attracted much attention due to the newly discovered ability of solving challenging data mining and machine learning problems. Studies has proved that NMF is equivalent with kernel k-means and probabilistic latent semantic indexing under some circumstances. Compared to most other clustering methods, NMF has been proved to achieve better or similar clustering results. In the thesis, our primary goal is to study the clustering problem by establishing NMF models reflecting the features of given data. First, in the case when the similarity of the data is available, we proposed two modified NMF models, one with a constraint (CNMF) and the other with a regularization term (RNMF). We take this situation as an example to show how to model the data information. Also, we compare the two commonly employed approach in this simple case. Next, we propose a novel model named augmented nonnegative matrix factorization (ANMF). The novelty of the model is that it incorporates the geometric closeness of the data on both dimensions of the data matrix. In addition to the experiments conducted on benchmark data sets, the model is also applied to real application, i.e. CiteUlike data set. Finally, for data sets with sparse features, we propose a new model named sparse regularized non-negative matrix factorization (SpaNMF). This type of data is ubiquitous in applications and has remained a hot topic for many years. Our novelty here is to combine the geometric structure and sparseness of the data. For all of the four models, we develop numerical algorithms and conduct the experiments. The results of the experiments show effectiveness of our proposed models compared with state-of-the-art clustering algorithms.
Metrics
47 File views/ downloads
37 Record Views
Details
Title
Exploring Data Clustering with Non-negative Matrix Factorization Models
Creators
Zunyan Xiong - DU
Contributors
Xiaohua Hu (Advisor) - Drexel University (1970-)
Awarding Institution
Drexel University
Degree Awarded
Doctor of Philosophy (Ph.D.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Number of pages
xi, 90 pages
Resource Type
Dissertation
Language
English
Academic Unit
Computer Science (Computing); College of Computing and Informatics; Drexel University
Other Identifier
6612; 991014632730804721
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services