Journal article
A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community
Communications of the Association for Information Systems, v 41(1), pp 450-496
01 Jan 2017
Abstract
In this guide, we introduce researchers in the behavioral sciences in general and MIS in particular to text analysis as done with latent semantic analysis (LSA). The guide contains hands-on annotated code samples in R that walk the reader through a typical process of acquiring relevant texts, creating a semantic space out of them, and then projecting words, phrase, or documents onto that semantic space to calculate their lexical similarities. R is an open source, popular programming language with extensive statistical libraries. We introduce LSA as a concept, discuss the process of preparing the data, and note its potential and limitations. We demonstrate this process through a sequence of annotated code examples: we start with a study of online reviews that extracts lexical insight about trust. That R code applies singular value decomposition (SVD). The guide next demonstrates a realistically large data analysis of Stack Exchange, a popular Q& A site for programmers. That R code applies an alternative sparse SVD method. All the code and data are available on github.com.
Metrics
Details
- Title
- A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community
- Creators
- David Gefen - Drexel UniversityJames E. Endicott - University of Colorado SystemJacob Miller - Drexel UniversityJorge E. Fresneda - Drexel UniversityKai R. Larsen - University of Colorado System
- Publication Details
- Communications of the Association for Information Systems, v 41(1), pp 450-496
- Publisher
- Assoc Information Systems
- Number of pages
- 47
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Decision Sciences (and Management Information Systems); Management
- Web of Science ID
- WOS:000414858200021
- Scopus ID
- 2-s2.0-85034024789
- Other Identifier
- 991019168597504721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Computer Science, Information Systems