Logo image
Topic modeling for natural language understanding
Dissertation   Open access

Topic modeling for natural language understanding

Xiaoli Song
Doctor of Philosophy (Ph.D.), Drexel University
Dec 2016
DOI:
https://doi.org/10.17918/etd-7143
pdf
Song_Xiaoli_20163.11 MBDownloadView

Abstract

Information science
This thesis presents new topic modeling methods to reveal underlying language structures. Topic models have seen many successes in natural language understanding field. Despite these successes, the further and deeper exploration of topic modeling in language processing and understanding requires the study of language itself and remains much to be explored. This thesis is to combine the study of topic modeling with the exploration of language. Two types of language are explored, the normal document texts, and the spoken language texts. The normal document texts include all the written texts, such as the news articles or the research papers. The spoken language text refers to the human speech directed at machines, such as smart phones to obtain a specific service. The main contributions of this thesis fall into two parts. The first part is the extraction of word/topic relation structure through the modeling of word pairs. Although the word/topic and relation structure has long been recognized as the key for language representation and understanding, few researchers explore the actual relation between words/topics simultaneously with statistical modeling. This thesis introduces a pairwise topic model to examine the relation structure of texts. The pairwise topic model is implemented on different document texts, such as news articles, research papers and medical records to get the word/topic transition and topic evolution. Another contribution of this thesis is the topic modeling for spoken language. Spoken language refers to the spoken text directed at machine to obtain a specific service. Spoken language understanding involves processing the spoken language and figure out how it maps to actions the user intents. This thesis explores the semantic and syntactic structure of spoken language in detail and provides the insight into the language structure. Also, a new topic modeling method is proposed to incorporate these linguistic features. The model can also be extended to incorporate prior knowledge, resulting in better interpretation and understanding of spoken language.

Metrics

51 File views/ downloads
88 Record Views

Details

Logo image