Journal article
Studying the Clustering Paradox and Scalability of Search in Highly Distributed Environments
ACM transactions on information systems, v 31(2)
01 May 2013
Abstract
With the ubiquitous production, distribution and consumption of information, today's digital environments such as the Web are increasingly large and decentralized. It is hardly possible to obtain central control over information collections and systems in these environments. Searching for information in these information spaces has brought about problems beyond traditional boundaries of information retrieval (IR) research. This article addresses one important aspect of scalability challenges facing information retrieval models and investigates a decentralized, organic view of information systems pertaining to search in large-scale networks. Drawing on observations from earlier studies, we conduct a series of experiments on decentralized searches in large-scale networked information spaces. Results show that how distributed systems interconnect is crucial to retrieval performance and scalability of searching. Particularly, in various experimental settings and retrieval tasks, we find a consistent phenomenon, namely, the Clustering Paradox, in which the level of network clustering (semantic overlay) imposes a scalability limit. Scalable searches are well supported by a specific, balanced level of network clustering emerging from local system interconnectivity. Departure from that level, either stronger or weaker clustering, leads to search performance degradation, which is dramatic in large-scale networks.
Metrics
Details
- Title
- Studying the Clustering Paradox and Scalability of Search in Highly Distributed Environments
- Creators
- Weimao Ke - Drexel UniversityJaved Mostafa - University of North Carolina at Chapel Hill
- Publication Details
- ACM transactions on information systems, v 31(2)
- Publisher
- Assoc Computing Machinery
- Number of pages
- 36
- Grant note
- UL1RR025747 / National Center for Research Resources; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Center for Research Resources (NCRR) W. Ke's startup fund in the College of Information Science and Technology at Drexel University UL1TR000083 / National Center for Advancing Translational Sciences; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Center for Advancing Translational Sciences (NCATS)
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000319209200003
- Scopus ID
- 2-s2.0-84878608184
- Other Identifier
- 991019167417504721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Computer Science, Information Systems