Journal article
Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine
World wide web (Bussum), v 22(6), pp 2437-2467
01 Nov 2019
Abstract
We propose two-dimensional indexing-a novel in-memory indexing architecture that operates over distributed memory of a massively-parallel search engine. The goal of two-dimensional indexing is to provide a one-integrated-memory view as in a single node system using one large integrated memory. In two-dimensional indexing, we partition the entire index into nx m fragments and distribute them over the memories of multiple nodes in such a way that each fragment is entirely stored in main memory of one node. The proposed architecture is not only scalable as it uses a scaled-out shared-nothing architecture but also is capable of achieving low query response time as it processes queries in main memory. We also propose the concept of the one-memory point, which is the amount of the memory space required to completely store the entire index in main memory providing a one-integrated-memory view. We first prove the effectiveness of two-dimensional indexing with single-keyword queries, and then, extend the notion so as to be able to handle multiple-keyword queries. To handle multiple-keyword queries, we adopt pre-join that materializes a multiple-keyword query a priori as well as a new notion of semi-memory join that obviates extensive communication overhead to perform join across multiple nodes. In experiments using the real-life search query set over a database consisting of 100 million Web documents crawled, we show that two-dimensional indexing can effectively provide a one-integrated-memory view without too much of additional memory compared with the single node system using one large integrated memory. We also show that, with a six-node prototype, in an ideal case, it significantly improves the query processing performance over a disk-based search engine with an equivalent amount of in-memory buffer but without two-dimensional indexing - by up to 535.54 times. This improvement is expected to get larger as the system is scaled-out with a larger number of machines.
Metrics
8 Record Views
Details
- Title
- Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine
- Creators
- Tae-Seob Yun - Korea Advanced Institute of Science and TechnologyKyu-Young Whang - Korea Advanced Institute of Science and TechnologyHyuk-Yoon Kwon - Seoul National University of Science and TechnologyJun-Sung Kim - Korea Advanced Institute of Science and TechnologyIl-Yeol Song - Drexel University
- Publication Details
- World wide web (Bussum), v 22(6), pp 2437-2467
- Publisher
- Springer Nature
- Number of pages
- 31
- Grant note
- 2016R1A2B4015929 / National Research Foundation of Korea (NRF) - Korean Government(MSIT)
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000504322400008
- Scopus ID
- 2-s2.0-85056456299
- Other Identifier
- 991019168436004721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Information Systems
- Computer Science, Software Engineering