automatic term recognition Collaboration consumer health vocabulary cross-lingual word vector space health consumer-generated content Large language models Limiting Medical services online health community User-generated content Vocabulary Semantics
The online health community (OHC) is the primary channel for laypeople to share health information. To analyze the health consumer-generated content (HCGC) from the OHCs, identifying the colloquial medical expressions used by laypeople is a critical challenge. The open-access and collaborative consumer health vocabulary (OAC CHV) is the controlled vocabulary for addressing such a challenge. Nevertheless, OAC CHV is only available in English, limiting its applicability to other languages. This research proposes a cross-lingual automatic term recognition framework for extending the English CHV into a cross-lingual one. Our framework requires an English HCGC corpus and a non-English (i.e., Chinese in this study) HCGC corpus as inputs. Two monolingual word vector spaces are determined using the skip-gram algorithm so that each space encodes common word associations from laypeople within a language. Based on the isometry assumption, the framework aligns two monolingual spaces into a bilingual word vector space, where we employ cosine similarity as a metric for identifying semantically similar words across languages. The experimental results demonstrate that our framework outperforms the other two large language models in identifying CHV across languages. Our framework only requires raw HCGC corpora and a limited size of medical translations, reducing human efforts in compiling cross-lingual CHV.
Metrics
9 Record Views
Details
Title
Constructing Cross-Lingual Consumer Health Vocabulary with Word-Embedding from Comparable User Generated Content
Creators
Chia-Hsuan Chang - Drexel University
Lei Wang - Drexel University
Christopher C. Yang - Drexel University
Publication Details
2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), pp 275-284
Publisher
IEEE
Number of pages
10
Grant note
IIS-1741306,IIS-2235548 / National Science Foundation (10.13039/100000001)
Resource Type
Conference proceeding
Language
English
Academic Unit
Information Science
Web of Science ID
WOS:001304501700035
Scopus ID
2-s2.0-85203684529
Other Identifier
991021901005004721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: