ODYS: A Massively-Parallel Search Engine Using a DB-IR Tightly-Integrated Parallel DBMS

Kyu-Young Whang; Tae-Seob Yun; Yeon-Mi Yeo; Il-Yeol Song; Hyuk-Yoon Kwon; In-Joong Kim

doi:10.48550/arxiv.1208.4270

Back

ODYS: A Massively-Parallel Search Engine Using a DB-IR Tightly-Integrated Parallel DBMS

Preprint

Open access

ODYS: A Massively-Parallel Search Engine Using a DB-IR Tightly-Integrated Parallel DBMS

Kyu-Young Whang, Tae-Seob Yun, Yeon-Mi Yeo, Il-Yeol Song, Hyuk-Yoon Kwon and In-Joong Kim

arXiv.org

21 Aug 2012

DOI: https://doi.org/10.48550/arxiv.1208.4270

Files and links (1)

url

https://doi.org/10.48550/arxiv.1208.4270View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Databases

Recently, parallel search engines have been implemented based on scalable distributed file systems such as Google File System. However, we claim that building a massively-parallel search engine using a parallel DBMS can be an attractive alternative since it supports a higher-level (i.e., SQL-level) interface than that of a distributed file system for easy and less error-prone application development while providing scalability. In this paper, we propose a new approach of building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS and demonstrate its commercial-level scalability and performance. In addition, we present a hybrid (i.e., analytic and experimental) performance model for the parallel search engine. We have built a five-node parallel search engine according to the proposed architecture using a DB-IR tightly-integrated DBMS. Through extensive experiments, we show the correctness of the model by comparing the projected output with the experimental results of the five-node engine. Our model demonstrates that ODYS is capable of handling 1 billion queries per day (81 queries/sec) for 30 billion web pages by using only 43,472 nodes with an average query response time of 211 ms, which is equivalent to or better than those of commercial search engines. We also show that, by using twice as many (86,944) nodes, ODYS can provide an average query response time of 162 ms, which is significantly lower than those of commercial search engines.

Metrics

11 Record Views

Details

Title: ODYS: A Massively-Parallel Search Engine Using a DB-IR Tightly-Integrated Parallel DBMS
Creators: Kyu-Young Whang
Tae-Seob Yun
Yeon-Mi Yeo
Il-Yeol Song
Hyuk-Yoon Kwon
In-Joong Kim
Publication Details: arXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Information Science
Other Identifier: 991021806422904721

ODYS: A Massively-Parallel Search Engine Using a DB-IR Tightly-Integrated Parallel DBMS

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media