Language is an integral part of marketing. Consumers share word of mouth, salespeople pitch services, and advertisements try to convince consumers to buy. Retail employees answer questions, customer service agents try to solve problems, and movies, books, and other cultural products use language to entertain and inform. Even consumers’ private thoughts are expressed using language.

Further, small differences in wording can have a big impact. The exact words used in word of mouth can shape its influence (Packard & Berger, 2017; Berger, Rocklage, and Packard 2022; Moore, 2012), the language service agents use shape customer satisfaction (Packard et al., 2018), and the words used in books, movies, and other cultural products shape their success (Berger et al., 2021).

But while it is clear that language is both frequent and important, how can we extract insight from this increasingly available form of data?

The digitization of content has created a wealth of textual information. Online reviews capture what consumers talk about and why, and social media posts shed light on brand perceptions. Customer service calls can be transcribed to understand what drives customer satisfaction, and experimental participants provide thought protocols that can be parsed for deeper insight into the mechanisms driving behavior.

But parsing this data requires the right tools: objective, scalable methods that turn text into data.

Building on recent work (e.g., Berger et al., 2020; Humphreys & Wang, 2018; Shankar & Parsana, 2022), this paper offers an accessible, hands-on introduction to three main approaches to automated textual analysis (i.e., dictionaries, topic modeling, and embeddings). We suggest these approaches can be thought of as tools to help understand the what, how, and why of consumer and marketing language. For those interested in what is being talked about, or the topic or themes discussed, topic modeling and embedding type approaches can be particularly useful. For those interested in how something is being talked about, or what motivations might be reflected, dictionary-based approaches can be particularly helpful.

We provide a brief summary of each approach, some examples of how it has been used, and some advantages and limitations. Further, we outline how these approaches can be used both in empirical analysis of field data as well as experiments. Finally, an appendix provides links to relevant tools and readings to help readers dive deeper.

While a detailed discussion of all the methods and uses of textual analysis is beyond the scope of this paper, we hope it provides useful pointers to places where readers can learn more.

1 Dictionaries

Some of the most user-friendly methods for text analysis are top-down, dictionary-based approaches. These approaches rely on a pre-existing list—i.e., a dictionary—of words, phrases, or symbols that are counted in a piece of text. For example, if researchers want to measure how certain consumers are, they might search their text using a dictionary that contains words such as “I’m convinced,” “don’t know,” and “absolutely” to represent the construct (Rocklage et al., 2022). If researchers are interested in measuring how self-focused consumers are, they might use a dictionary that contains words like “I,” “me,” and “mine” (Spiller & Belogolova, 2017). Each of these words is searched for in the target text and then summed. Texts with greater use of “me,” for example, would have higher “self-focused” scores because more matches would be found from the dictionary.

This method is particularly useful for getting started with automated text analysis because dictionary software is generally easy-to-use and free, and there are many standardized dictionaries to choose from (Humphreys & Wang, 2018). Researchers can measure constructs using the Linguistic Inquiry and Word Count software (Boyd et al., 2022), sentiment/attitudes using the Evaluative Lexicon (Rocklage et al., 2018a, 2018b), and nonverbal cues using the textual paralanguage classifier (Luangrath, Xu, and Wang 2022), for just three examples (see Web Appendix for more). Each of these uses a slightly different approach to quantify language, but all rely on a dictionary to search for words of interest.

1.1 Linguistic inquiry and word count

One widely used set of dictionaries is Linguistic Inquiry and Word Count (LIWC; Boyd et al., 2022). LIWC includes a range of wordlists, many of which were developed based on psychological scales. For example, LIWC includes a wordlist for measuring positive and negative emotion based on the PANAS scale (Watson et al., 1988). LIWC includes 20 measures of linguistic features (e.g., verb tense), 60 psychological categories (e.g., emotion, cognition), and 19 substantive categories (e.g., leisure) in addition to measures of punctuation (Boyd et al., 2022). Higher-level categories can be used to summarize other subgroups in the software. For example, “clout” is a combined measure of second-person pronouns (“we”), negations, and swear words (Jordan & Pennebaker, 2015). The most recent version adds tools to build word clouds, identify language style matching, and find narrative structure. To normalize for text length (e.g., words in an online post), the LIWC software produces data in the form of percent of total words.

LIWC’s dictionaries have been validated on a range of materials such as academic abstracts, English literature texts, and other spoken and written material (King & Pearce, 2010; Tausczik & Pennebaker, 2010). LIWC has been used to assess social acceptance in news media (Humphreys, 2010), emotional contagion (Berger & Milkman, 2012), attentional and social focus in tweets (Barasch & Berger, 2014), and market logics (Ertimur & Coskuner-Balli, 2015). LIWC also allows researchers to create custom dictionaries to measure other constructs (Humphreys & Wang, 2018). And although scholars have found more precise ways to measure some constructs like sentiment (Hartmann et al., 2019), LIWC remains a good place to start (www.liwc.app).

1.2 The Evaluative Lexicon

Another example of a dictionary approach is the Evaluative Lexicon (EL; Rocklage & Fazio, 2015; Rocklage et al., 2018a, 2018b). The EL is a validated measure of the valence, extremity, and emotionality of individuals’ opinions in language. To construct the dictionary, researchers used billions of words, millions of online reviews, and the judgments of a large set of external raters. Based on this data-driven approach, the EL searches only for words that provide a reliable signal of individuals’ opinions in natural language. The final dictionary includes words such as “magnificent,” “problematic,” and “flavorful.”

Rather than simply counting whether a word is present or not in a piece of text, the EL gives a score to each word in its dictionary based on validated external ratings. For example, the word “flawless” has the score of 8.24 on valence (out of 9.00), 3.74 on extremity (out of 4.50), and 3.05 on emotionality (out of 9.00). On the other hand, “elated” signals a very different opinion—one that is equally positive, but based more on emotion (scores of 8.20, 3.70, and 7.11, respectively). The EL dictionary and its scores have been extensively validated and applied across social media posts, audio transcripts, consumer reviews, and a number of other contexts (Berger, Rocklage, and Packard 2022; Rocklage & Luttrell, 2021; Rocklage et al., 2021). It is available at www.EvaluativeLexicon.com.

1.3 Textual paralanguage classifier

The textual paralanguage classifier (PARA) identifies nonverbal communication cues in text (Luangrath, Xu, and Wang 2022). In contrast to other tools that rely predominantly on words themselves, PARA takes an alternative approach and focuses on nonverbal parts of speech. This tool detects 19 different auditory, tactile, and visual features of text (Luangrath et al., 2017). For example, vocal aspects of text speech convey stress with CAPS (e.g., GREAT), emphasis (e.g., !!!!), tempo (e.g., amazingggggg, in this case denoted with “stretchable words”), vocalizations (e.g., ugh or ahh), body language–like emojis (e.g., ), or emoticon facial expressions (e.g.,:-D), among others. The detection of these linguistic markers influences perceptions of sentiment valence and intensity, and improves prediction accuracy of consumer engagement on social media (Luangrath, Xu, and Wang 2022). PARA software operates using a panel of five sub-dictionaries and rule-based algorithms. PARA is particularly helpful for text that is more informal such as social media data, customer service chats, email, blogs, comments, text created in apps, or any content generated via mobile device as these often contain textual paralanguage. PARA is less helpful when analyzing formal text (e.g., shareholder reports). PARA can be found at www.textualparalanguage.com.

1.4 When to use

Dictionary approaches are useful when text can be specified in relatively precise or finite ways that can be easily represented by word presence or absence. For that reason, they excel at measuring individual and cultural focus (i.e., what is being attended to) or emphasis on a single particular subject or construct (Humphreys & Wang, 2018; Tausczik & Pennebaker, 2010). Because words are specified a priori, dictionary methods also perform well when researchers have a firm idea of the operationalization of constructs in the text. And because there are many well-validated dictionaries available, the approach allows for concurrent and construct validity when working across studies.

When it is difficult to specify the operationalization of constructs, however, or when measuring the construct requires studying sentence structure or inter-relation of words within a sentence, other methods may be helpful. Similarly, while dictionaries can be used broadly across contexts, classifiers or other machine learning approaches designed for prediction may perform better in very specific contexts. The meaning of words like “we” and “our,” for example, may be quite different in conversation than in academic papers.

2 Topic modeling

Beyond the individual words companies, consumers, or employees use, what broader topics or themes are they talking about? Do hotel reviews tend to talk about the room, the service, or the food? Should retail employees focus on customer needs or the products offered? And in an experimental context, does a manipulation impact what topics study participants focus on?

Topic models can answer these questions and more. Rather than focusing on top-down, pre-determined constructs, as is often the case with dictionaries, topic modeling is usually bottom-up, using words that co-occur within and across texts to determine the latent topics that appear. Based on this, the method outputs different topics and the words associated with them. This, in turn, can be used to identify how much of a given text is about each latent topic. For a travel review, for example, 51% of the review might be about a hotel room, 25% about the front desk, and 24% about the restaurant.

One common topic modeling approach is latent Dirichlet allocation (LDA; Blei et al., 2003), although a variety of options are available (Vayansky & Kumar, 2020). While some of these lean to more complex neural network approaches, basic topic modeling can be performed in a straightforward manner using R or Python, and less technical users can simply upload a text file at a website to generate LDA results (see www.textanalyzer.org).

Marketing researchers have used topic models in a variety of novel ways. Tirunillai and Tellis (2014) explore dimensions linked to quality, how they change over time, and how that relates to competitive brand positioning. Li and Ma (2020) show how marketers can use topic modeling of consumer search terms to identify where consumers are in the decision-making process. The approach was used to find spoilers in movie reviews, which surprisingly help, rather than hurt, ticket sales (Ryoo et al., 2021). Topic modeling can also be used to find and examine specific psychological constructs relevant to marketing. Zhang, Li, and Ng (2021) performed guided LDA by training it on an initial set of words associated with warmth and competence, and then scored thousands of brands appearing in Yelp reviews according to those perceptual dimensions. Chung and colleagues (2022) used the approach to uncover the motivations (e.g., intrinsic vs. financial) of people who rent their properties on Airbnb. Topic model results can also be useful as control variables, such as accounting for different topics that might arise in customer service conversations (Packard & Berger, 2021).

In addition to field data, topic modeling can also be used on the language produced in experiments. Researchers could analyze thought listings after a manipulation, for example, to see if thoughts differ across conditions in conceptually or substantively meaningful ways. This approach might be especially useful when self-report scales are not available, when participants have less insight into their own attitudes, or when response bias may lead to inaccurate responding.

2.1 When to use

Like any method, topic modeling has limitations. While fit statistics such as coherence or perplexity can help, interpretation of each topic’s theme or meaning is ultimately up to the researcher, leaving considerable degrees of freedom if topic meaning is important. Independent judges can be used to score the topics to help in such cases. What’s more, topic modeling does not account for the proximity of words within texts. Even if “river” and “bank” appear several sentences or paragraphs away from each other, topic modeling might think they are related. Embeddings can help complement this shortcoming, as can embedded topic modeling that combines aspects of both approaches (Dieng et al., 2020).

3 Embeddings

Word embedding models have emerged as a popular way to capture semantic information contained in text without labor-intensive manual labeling. These models rely on statistical algorithms to learn semantic representations from word co-occurrence patterns in natural language (e.g., Bullinaria & Levy, 2012; Landauer & Dumais, 1997; Lenci, 2018). They examine the appearance of a word across different contexts (i.e., surrounding words) and represent it as a dense numerical vector—often with tens or hundreds of dimensions—in a vector space. This allows for performing mathematical operations on text, such as calculating how different words, paragraphs, or entire texts are related (e.g., using measures like cosine similarity).

Importantly, the mapping of words to vector representations is based not only on co-occurrence and frequency, but also on context. Consequently, words used in similar ways have similar vector representations. “Dog” and “cat” (i.e., pets) may appear close together in vector space, for example, as might “banana” and “blueberry” (i.e., fruits), but “dog” and “blueberry” should appear farther apart. Moreover, such vector spaces can capture analogies between words. Subtracting the vector representation of “men” from that of “king,” for example, yields a vector that is equivalent to the one obtained by subtracting “women” from “queen” (Mikolov et al., 2013). That is, given the analogy to solve: “king” is to “men” as “queen” is to “___”, these vector spaces can correctly predict that “women” should be the answer.

Word embedding models thus quantify the semantic relations between different words, such that their degree of contextual overlap indicates their semantic relatedness (Boleda, 2020; Harris, 1970; Lenci, 2018). Importantly, it also extends to higher-order relationships beyond direct co-occurrence. For example, synonyms rarely co-occur, as usually only one is used in a given context, yet their closeness in meaning is reliably captured by word embedding models (Bullinaria & Levy, 2012). As such, word embedding models trained on large text corpora would have access not only to semantic relationships between, for example, product categories and brands (e.g., “fast food” and “McDonald’s”), but also to their relationships with shared concepts (e.g., “cheap”, “hamburger”, and “drive-through”).

Because of their relative novelty and data requirements, word embedding techniques have seen fewer applications within marketing, at least so far. Nonetheless, several recent papers demonstrate embeddings’ potential to address a variety of important questions. Gabel et al. (2019) utilize a “product embeddings” technique (P2V-MAP) on market basket data from a grocery retailer to quantify latent, attribute-level similarities between products, and thereby map market structures (e.g., product complementarities vs. competitions). They find, for example, that wines form distinct clusters along price ranges, likely reflecting consumer loyalty to specific price tiers for wine (Jarvis & Goodman, 2005). Timoshenko and Hauser (2019) demonstrate how word embeddings can identify customer needs from product reviews, while offering important advantages over more conventional techniques (e.g., interviews and focus groups). Toubia, Berger, and Eliashberg (2021) use embeddings to quantify the speed, volume, and circuitousness of texts, demonstrating that these features help explain whether books, movies, academic papers, and other cultural products succeed or fail (also see Laurino Dos Santos & Berger, 2022). Bhatia and Olivola (2018, 2021) showed that word embeddings can be used to predict the subjective dimensions of brand perception (e.g., brand personality traits, Aaker, 1997) for hundreds of brands and evaluation dimensions. Then, they were able to quantify and map the associations between brands and a rich variety of concepts. Such semantic maps, in turn, can serve as a foundation to study many interesting questions. For example, Aka et al. (2020) relied on this approach to link the perceptions of brands to the personality traits of consumers who “like” them on Facebook and tested whether consumers prefer brands that “fit” their own psychological tendencies (see Nave et al., 2020, for a similar approach). Finally, Zhang et al. (2018) showed that word embedding models trained on large text corpora can be used to predict consumer brand recall, without having to rely on collecting additional survey data.

3.1 When to use

While embeddings are quite useful, they are not without limitations. Given the key assumption that related words tend to appear in similar contexts, word representations depend on the properties of the text corpus used to learn them. In some cases, researchers will want to utilize word embeddings trained on text corpora tailored to their research questions (e.g., using a time-stamped corpora of tweets to study the evolution of brand perceptions on social media). In practice, however, training and validating such models requires access to very large text corpora with millions or even billions of words. Consequently, off-the-shelf embedding representations, learned from large and rich text corpora (e.g., Google News, Twitter, and Wikipedia), are often used (e.g., https://code.google.com/archive/p/word2vec/).

Technical challenges also remain. One is imprecision due to semantic ambiguity. In a Twitter corpus, for example, the word “apple” will sometimes refer to the brand, and in other cases to the fruit, yielding imprecise embedding representations of “apple.” Embedding representations can also differ depending on the type of documents used to learn them. A brand or product will likely be represented differently in a model trained using a corpus of financial reports, for example, versus one trained using a corpus of consumer reviews.

4 Using text in experiments

Most of the text analysis examples discussed so far used field data, but these tools can also be used in experiments. Indeed, papers on language-focused topics frequently employ mixed methods, incorporating text analysis in both field data and experiments (e.g., Packard et al., 2018). In experiments, text can be used as an independent variable, dependent variable, or mediator—and text analysis tools can assess text used in each of these ways.

Manipulating text as an independent variable is useful for researchers studying how senders are affected by producing certain language, or how receivers are affected by hearing certain language. To manipulate language production, researchers can give participants general instructions to follow (e.g., be persuasive; Rocklage et al., 2018a, 2018b). Alternately, participants can be asked to complete a controlled, pre-scripted text that varies across conditions. For example, some participants complete sentences with explanations, while others complete sentences without (e.g., Moore, 2012). To manipulate language that receivers are exposed to, researchers can construct texts that vary in specific ways and measure their impact on participants’ attitudes and behaviors (Lafreniere, Moore, and Fisher 2022; Rocklage & Fazio, 2020). For example, participants could read researcher-created reviews for material versus experiential purchases to see which they rely upon more (Dai et al., 2020).

Examining text as a dependent variable is useful for exploring how language use or preferences vary under different conditions. For example, researchers have used text analysis tools to test how audience size (e.g., small vs. large), device type (e.g., mobile vs. desktop), or goals (e.g., persuasion) alter participant-generated text in terms of sentiment or emotionality (Barasch and Berger 2014; Melumad et al., 2019; Rocklage et al., 2018a, 2018b). Alternately, participants—as senders or receivers—may choose from researcher-created text that varies in controlled ways (e.g., Moore & McFerran, 2017; Schellekens et al., 2010). For example, senders might choose which of several sentences they would use when writing a review, while readers might choose which sentences would be more helpful when reading a review (Moore, 2015).

The tools described above can be used to assess text in experiments, whether it is used as an independent or a dependent variable. When text is manipulated as an independent variable, these tools can be employed to conduct manipulation checks. For example, participant-generated text can be checked to ensure that it varies as expected across experimental conditions (e.g., more positive emotion words when participants are assigned to write about a positive vs. negative purchase). Further, when researcher-created text is used, as either an independent or a dependent variable, tools can be used to ensure that these texts vary in terms of the language of interest (e.g., pronouns), but do not vary in other ways (e.g., sentiment; Moore & McFerran, 2017; Packard et al., 2018).

Finally, when using text as a mediator, dictionary-based tools can be applied to participant-generated text designed to capture a hypothesized process. For example, Wu and colleagues (Wu et al., 2019) conceptualized the proportion of other-focused pronouns (e.g., they, she) in participants’ open-ended responses as a reflection of attention to others and used this proportion as a mediating variable.

5 Conclusion

Language is part of almost every marketing interaction. Brands, consumers, and employees use language to communicate, persuade, and offer assistance. Consequently, by quantifying the insights hidden in language, automated textual analysis opens up a range of interesting research questions.

In this paper, we offer an accessible introduction to the three main approaches to automated text analysis, discussing how they can be used to extract meaning from text (i.e., how, why, and what), and how these approaches might complement each other.

Dictionaries, for example, could be used to study why some products or services get talked about for longer than others (e.g., because more concrete words or emotional language is used), or how technology shapes communication. Topic modeling of online reviews could be used to explore drivers of consumer motivations, and reasons for product or service failure. And embeddings models could be used to study cultural differences (e.g., in gender bias or discrimination), or how brands evolve (e.g. how the representation of “Tesla” has changed over the past decade) across different markets given their ability to account for context.

These three approaches can also be used to provide insight into the two key functions of language (Berger et al. 2020). First, language impacts the audiences that consume it. The words used by consumers, salespeople, or advertisements shape the attitudes and actions of the people that hear or read them. Packard and Berger (2021), for example, used a concreteness dictionary to show that when customer service agents use more concrete language it boosts customer satisfaction; Berger and Packard (2018) used topic modeling to test ideas about the impact of atypicality or novelty on product success; and Wang, He, and Curry (2021) used word embeddings to figure out which product attributes most impact consumer attitudes as they are expressed in online reviews.

Second, language also reflects things about the consumer, company, or culture that created it. What someone said, for example, provides insight into their personality and demographic characteristics, and company language sheds light on everything from attitudes towards customers to things like gender bias and discrimination. Proserpio, Troncoso, and Valsesia (2021), for example, used dictionaries to test whether responses from hotel management reflect a gender bias. Boghrati and Berger (2022) use embeddings and a quarter of a million songs over 60 years to explore whether gender bias has changed over time. And a combination of dictionaries, topic modeling, and embeddings was used to reveal how reviewers’ expressed attitudes reflect the reviewer’s personal motivation to share their opinion (Chakraborty et al., 2022).

Overall, the three approaches outlined can help researchers study how text both impacts audiences and reflects things about language producers. Language can be used to both understand (and predict) consumer behavior and other marketing outcomes, as well as gain insight into people and culture more generally. Hopefully the tools outlined here will help more researchers explore this exciting area.