Dissertation
Fundamental limits of communication in distributed computation frameworks
Doctor of Philosophy (Ph.D.), Drexel University
May 2018
DOI:
https://doi.org/10.17918/D8S96X
Abstract
This dissertation develops a method for integrating information theoretic principles in distributed computing frameworks, distributed learning, and database design. In particular, it incorporates compression coding in such a way as to accelerate the computation of statistical functions of the data in distributed computing frameworks. Moreover, it studies the limits of decentralized compressors for query processing which speeds up computations by requiring less network traffic and efficient memory usage, and proposes a rate-distortion principle to gain some insight in distributed learning algorithms. The dissertation's insights into tradeoffs in practical distributed computing are garnered by an information theoretic model for decentralized function computation. In decentralized function computation, a series of two or more networked nodes, each storing or measuring local observation sequences, separately encode their observations, and communicate interactively with a sink, hence-forth the central estimation officer (CEO), with the goal of enabling it to compute some function or statistic of their combined data across each element in the sequence. Of great interest in this problem is the rate-distortion region, describing all possible vectors of rates, determining the sizes of each of the messages sent for which there exist encoders at the observation nodes and a decoder not exceeding a specified distortion. We assemble this problem formulation to model the cache formation design in distributed shared memory systems for query processing. Unlike traditional designs that focus on a particular query (e.g., search, retrieval, similarity), we aim to couple the representation extraction with compression and generalize the query to any types of function. We study the single letter description of the achievable region of trade-off between the storage size versus accuracy (general distortion metric) and storage size versus the cost of cache miss (log-loss distortion) for finite-alphabet sources. Another main source of difficulty after obtaining the rate-distortion expression is finding practical schemes and computing the region. The second part of the dissertation is focused on designing optimal encoding functions. With the aim of enabling the master node to compute the extrema function, the fundamental lower bound on the information exchange rate required over all quantization schemes, both scalar and vector, is computed for this interactive problem with a known iterative convex geometric method. Next, an optimal dynamic program achieving the minimum expected rate and expected rate delay tradeoff over all scalar quantization schemes is presented, and the benefits of enabling nodes to overhear each others' messages is assessed. Furthermore, a series of substantially reduced complexity dynamic programs are shown, both theoretically and empirically, to obtain performance close to the fundamental limits, and to scale favorably as the number of nodes grow. The final chapter of the dissertation proposes a novel method to design practical codes for the dissertation's accelerated distributed computing model by interactively training multi-task neural networks. This deep learning model is able to design the memory content to maximize cache hits even if the query is not known in advance. This use of deep neural networks to design codes for interactive distributed cache formation is inspired by a popular recent insight, known as the information bottleneck, that activations of the nodes at each layer of a neural network can be reinterpreted as a sufficient statistic compressing the information provided by the previous layer about the ultimate classification problem. Multiple experiments show that our neural cache formation machine can learn to compute a set of possible query functions from the cache contents by caching a learned meaningful representation of the data universal across these query tasks.
Metrics
51 File views/ downloads
24 Record Views
Details
- Title
- Fundamental limits of communication in distributed computation frameworks
- Creators
- Solmaz Torabi Ardakani - DU
- Contributors
- John MacLaren Walsh (Advisor) - Drexel University (1970-)
- Awarding Institution
- Drexel University
- Degree Awarded
- Doctor of Philosophy (Ph.D.)
- Publisher
- Drexel University; Philadelphia, Pennsylvania
- Number of pages
- ix, 115 pages
- Resource Type
- Dissertation
- Language
- English
- Academic Unit
- College of Engineering (1970-2026); Electrical (and Computer) Engineering (1970-2026); Drexel University
- Other Identifier
- 8188; 991014632410504721