32 Text Analytics in R

The content is under development/finalisation

Text analytics is crucial in data analytics, as text data is becoming increasingly significant across various applications, including marketing analytics. Text is often replacing other forms of unstructured data because it is cost-effective and up-to-date. To fully leverage the potential of text data, we need to understand how to process, clean, summarize, and model it. In this chapter, we will use the R workhorse tools to efficiently begin working with text. We will gain skills in wrangling and visualizing text, so that we are able to perform sentiment analysis in next chapter. We will also discuss a little about running and interpreting topic models, thereby highlighting the indispensable role of text analytics in modern data analysis.

Text data processing faces unique challenges due to the complexity and variability of human language. Unlike structured data, text is unstructured and highly diverse, encompassing different languages, dialects, slang, and abbreviations. This variability complicates standardization and requires sophisticated preprocessing techniques to clean and prepare the data. Additionally, the context-dependent nature of language makes accurate interpretation difficult, as words and phrases can have varying meanings based on their usage.

R for Audit Analytics

32 Text Analytics in R

Text pre-processing

32.1 Tokenisation

32.2 Stop words

32.3 Stemming

32.4 Text Cleaning vy removing punctuation and other unwanted characters/text