Conference proceeding
Feature reinforcement approach to poly-lingual text categorization
ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS, v 4822
01 Jan 2007
Abstract
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naive approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora.
Metrics
Details
- Title
- Feature reinforcement approach to poly-lingual text categorization
- Creators
- Chih-Ping Wei - National Tsing Hua UniversityHuihua Shi - National Sun Yat-sen UniversityChristopher C. Yang - Chinese University of Hong Kong
- Contributors
- DHL Goh (Editor)T H Cao (Editor)I T Solvberg (Editor)E Rasmussen (Editor)
- Publication Details
- ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS, v 4822
- Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature
- Number of pages
- 2
- Grant note
- NSC 93-2416-H-110-021; NSC 94-2416-H-110-002 / National Science Council of the Republic of China; Ministry of Science and Technology, Taiwan
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000252143400017
- Scopus ID
- 2-s2.0-38149032438
- Other Identifier
- 991021855285704721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Hardware & Architecture
- Computer Science, Information Systems
- Computer Science, Theory & Methods
- Information Science & Library Science