Logo image
Data reduction techniques for efficient performance monitoring of computing systems
Dissertation   Open access

Data reduction techniques for efficient performance monitoring of computing systems

Salvador Isaiah DeCelles
Doctor of Philosophy (Ph.D.), Drexel University
Dec 2018
DOI:
https://doi.org/10.17918/f19f-8k20
pdf
DeCelles_Salvador_20182.29 MBDownloadView

Abstract

Electrical engineering Compressed sensing (Telecommunication) Data reduction--Computer programs Computer Engineering
Vigilance and maintenance are crucial elements toward the continued performance of any computing system. In datacenters, decisions such as resource allocation and fault detection are often made in direct response to information collected in real-time. A properly monitored system requires managing a massive flow of data, ultimately adding considerable stress to the acts of collection, transmission, analysis and storage. At the server, the act of monitoring poses the problem of resource competition with the application itself. For the transmission of data from servers to central monitoring station, bandwidth limitations become an issue. And at the monitoring station, the availability of memory for the storage of data can prove problematic. This thesis develops a set of frameworks aimed at computationally efficient reduction of data for ease of transmission, potential reconstruction and subsequent analysis of transferred information. The thesis focuses on a setting in which performance data is collected locally at a server and transferred to a monitoring station for analysis. Methods such as Shannon entropy, dual prediction and compressive sampling are utilized to reduce data at the server; we employ Principle component analysis (PCA) at the monitoring station to detect potential anomalies. Entropy is the scalar measure of a distribution's spread, how dispersed or collected the individual points of data are with respect to each other. Dual prediction is a two-part technique designed to break a signal down into a minimal form, the error computed when comparing prediction to signal, at the first stage and restore the signal from this form to its original state, by adding the error back to the prediction, at the second stage. This is accomplished through the use of mirrored prediction units stationed at each stage. Compressive sampling is a technique by which a signal is expressed into a different basis, such as Haar wavelet. Under a proper basis with respect to the signal, it may yield a concise representation of the data or, in other words, effectively become sparse. PCA is a dimensional reduction technique which maps n-dimensional data to a set of orthogonal components known as principal components. From the most to the least principal, these components measure the dimension of highest to lowest variance. Removal of the least principal component(s) will reduce the dimensionality of the data-set with minimized loss of relevant information. We validate these frameworks via experimentation with data collected through two scenarios. In the first scenario, we generate fault-influenced data using long-running enterprise benchmark applications, Trade6 and RuBBoS. In the second scenario, we utilize workload traces collected from one of Google's production clusters. Regarding transmission, experiments show that while transferring only 10% of the original signal, we are capable of restoring the signal with a maintained fidelity between 90%-95%. For PCA-based detection, our tests reveal that we can achieve detection on the level of analyzing the raw data without the cost of restoring a compressively sampled signal with a compression rate exceeding 75%.

Metrics

49 File views/ downloads
27 Record Views

Details

Logo image