Logo image
Text Preprocessing
Book chapter

Text Preprocessing

Murugan Anandarajan, Chelsey Hill and Thomas Nolan
Practical Text Analytics, pp 45-59
20 Oct 2018

Abstract

Lemmatization n-grams Natural language processing POS tagging Stemming Stop words Text parsing Text preprocessing Tokens
This chapter starts the process of preparing text data for analysis. This chapter introduces the choices that can be made to cleanse text data, including tokenizing, standardizing and cleaning, removing stop words, and stemming. The chapter also covers advanced topics in text preprocessing, such as n-grams, part-of-speech tagging, and custom dictionaries. The text preprocessing decisions influence the text document representation created for analysis.

Metrics

35 Record Views

Details

Logo image