Logo image
Community-based methods for large-scale relational data analysis and applications
Dissertation   Open access

Community-based methods for large-scale relational data analysis and applications

Zheng Chen
Doctor of Philosophy (Ph.D.), Drexel University
May 2020
DOI:
https://doi.org/10.17918/00000232
pdf
Chen_Zheng_202013.86 MBDownloadView

Abstract

Anomaly detection (Computer security) Neural networks (Computer science) Big Data Stochastic Models
Data mining and machine learning equipped with large-scale data is an interesting and popular research topic and in both academia and industry. Leveraging large data has been shown helpful for a wide variety of problems and application, and it is still in active development. Large data introduces challenges to almost all aspects of data mining and machine learning, including performance, efficiency, scalability, deployment, etc. For this dissertation, we focus on the relational data and develop a community-based approach to develop a framework capable of handling large-scale relational data by the "divide-and-conquer" strategy, while we also involve techniques like pre-training and sampling to improve performance generality and scalability. Relational data is ubiquitous; it is not just the graph data with explicit links like social network or knowledge base; any data with feature vectors can have underlying relations by similarities or correlations. The topology of a set of real-life relational data tend to have underlying community-like structures of which we can take advantage. This thesis develops methods that try to solve sub-problems on partitions of a large dataset and then merge the results if necessary.

Metrics

45 File views/ downloads
44 Record Views

Details

Logo image