The relational DBMS (RDBMS) has been widely used since it supports various
high-level functionalities such as SQL, schemas, indexes, and transactions that
do not exist in the O/S file system. But, a recent advent of big data
technology facilitates development of new systems that sacrifice the DBMS
functionality in order to efficiently manage large-scale data. Those so-called
NoSQL systems use a distributed file system, which support scalability and
reliability. They support scalability of the system by storing data into a
large number of low-cost commodity hardware and support reliability by storing
the data in replica. However, they have a drawback that they do not adequately
support high-level DBMS functionality. In this paper, we propose an
architecture of a DBMS that uses the DFS as storage. With this novel
architecture, the DBMS is capable of supporting scalability and reliability of
the DFS as well as high-level functionality of DBMS. Thus, a DBMS can utilize a
virtually unlimited storage space provided by the DFS, rendering it to be
suitable for big data analytics. As part of the architecture of the DBMS, we
propose the notion of the meta DFS file, which allows the DBMS to use the DFS
as the storage, and an efficient transaction management method including
recovery and concurrency control. We implement this architecture in
Odysseus/DFS, an integration of the Odysseus relational DBMS, that has been
being developed at KAIST for over 24 years, with the DFS. Our experiments on
transaction processing show that, due to the high-level functionality of
Odysseus/DFS, it outperforms Hbase, which is a representative open-source NoSQL
system. We also show that, compared with an RDBMS with local storage, the
performance of Odysseus/DFS is comparable or marginally degraded, showing that
the overhead of Odysseus/DFS for supporting scalability by using the DFS as the
storage is not significant.
Metrics
6 Record Views
Details
Title
Odysseus/DFS: Integration of DBMS and Distributed File System for Transaction Processing of Big Data