Logo image
A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community
Journal article   Open access

A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community

David Gefen, James E. Endicott, Jacob Miller, Jorge E. Fresneda and Kai R. Larsen
Communications of the Association for Information Systems, v 41(1), pp 450-496
01 Jan 2017
url
https://doi.org/10.17705/1cais.04121View
Published, Version of Record (VoR)Maybe Open Access (Publisher Bronze) Open
url
https://doi.org/10.17705/1CAIS.04121View
Published, Version of Record (VoR) Open

Abstract

Computer Science Computer Science, Information Systems Science & Technology Technology
In this guide, we introduce researchers in the behavioral sciences in general and MIS in particular to text analysis as done with latent semantic analysis (LSA). The guide contains hands-on annotated code samples in R that walk the reader through a typical process of acquiring relevant texts, creating a semantic space out of them, and then projecting words, phrase, or documents onto that semantic space to calculate their lexical similarities. R is an open source, popular programming language with extensive statistical libraries. We introduce LSA as a concept, discuss the process of preparing the data, and note its potential and limitations. We demonstrate this process through a sequence of annotated code examples: we start with a study of online reviews that extracts lexical insight about trust. That R code applies singular value decomposition (SVD). The guide next demonstrates a realistically large data analysis of Stack Exchange, a popular Q& A site for programmers. That R code applies an alternative sparse SVD method. All the code and data are available on github.com.

Metrics

33 Record Views
48 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Computer Science, Information Systems
Logo image