Journal article
Exploiting Temporal Characteristics of Features for Effectively Discovering Event Episodes From News Corpora
Journal of the Association for Information Science and Technology, v 65(3), pp 621-634
01 Mar 2014
Abstract
An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency x Inverse Document Frequency(Tempo) (TFxIDF(Tempo)) and TFxEnhanced-IDFTempo, and develop a temporal-based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TFxIDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TFxEnhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TFxIDF(Tempo).
Metrics
Details
- Title
- Exploiting Temporal Characteristics of Features for Effectively Discovering Event Episodes From News Corpora
- Creators
- Chih-Ping Wei - National Taiwan UniversityYen-Hsien Lee - National Chiayi UniversityYu-Sheng Chiang - IBM China Development Lab; IBM Taiwan; 11F, Bldg. E, 19-11 SanChong Road, NanKang Dist. Taipei 115 Taiwan ROCChun-Ta Chen - Chunghwa TelecomChristopher C. C. Yang - Drexel University
- Publication Details
- Journal of the Association for Information Science and Technology, v 65(3), pp 621-634
- Publisher
- Wiley
- Number of pages
- 14
- Grant note
- NSC 97-2752-H-007-003-PAE; NSC 100-2410-H-002-021-MY3 / National Science Council of the Republic of China; Ministry of Science and Technology, Taiwan
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000335582300014
- Scopus ID
- 2-s2.0-84900519879
- Other Identifier
- 991019182768104721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Industry collaboration
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Information Systems
- Information Science & Library Science