1 Introduction

Microsoft’s share price has increased by more than 220 percent in the past five years, while the S&P 500 index has gained just over 70 percent. Experts assert that the leadership style of the new unconventional CEO, Satya Nadella, is one key to Microsoft's success. Nadella is transforming Microsoft; in public appearances, he communicates in a stand-out manner and advocates his vision for change. In other words, he acts as a charismatic figure inside and outside the firm he leads. This conflation of the person of the CEO and the success of the company they lead has a long tradition: Jack Welch stood for General Electric's revived success, much as Steve Jobs stood for Apple's early success. CEO selection processes at 850 U.S. companies revealed that firms were consistently attracted to individuals whose charisma impressed analysts and the public (Khurana 2002). However, what may seem superficial at first glance pays off for the firms that select charismatic managers. Charismatic managers shape how their firms are perceived in the markets (e.g. Fanelli et al. 2009), but most importantly, a manager's charisma drives a firm's performance at all levels, from the individual employee to the entire organization (e.g., Banks et al. 2017). However, while anecdotal encounters, such as that detailed above, offer impressive insight into the transformational impact that charismatic managers produce, and while the evidence supporting the effects of charismatic leadership is incontrovertible, there exists, nevertheless, a paucity of reliable, theoretically sound questionnaire instruments to measure managers’ charisma (Antonakis et al. 2016).

The roots of this void lie in the way managers' charisma has been conceptualized and operationalized so far. Managers’ demeanor causes their charismatic aura, but it only becomes visible through the resonance of their audience. Thus, charismatic signals merge with their social resonance and the narrative of the manager's persona. Existing conceptualizations of managers' charisma thus refer to the effect that charismatic managers have on their audience. In other words, existing conceptualizations are recursive and, therefore, endogenous, and so are the resulting operationalizations in the study of managerial charisma (MacKenzie 2003). In research and practice, managers' charisma is measured by questionnaires whose items measure the effects of managers' charisma instead of referring to the behaviors that constitute their charisma. For example, the widespread Conger-Kanungo Scale of Charismatic Leadership (Conger and Kanungo 1994) asks managers, among other items, to rate the statement: “I am an exciting speaker”. The Multifactor Leadership Questionnaire (Avolio and Bass 2004), the most prominent instrument used on charismatic leadership and a fixed element of leadership assessments and selection procedures, lists the item: “I display a sense of power and confidence.” The problem persists in recent psychometrics, with the General Charisma Inventory (Tskhay et al. 2018) asking respondents to rate statements such as “Is a good leader,” or “Has a strong presence.” These problems with existing questionnaires have led to harsh criticism of measuring charisma with this method (van Knippenberg and Sitkin 2013; Fischer and Sitkin 2023). Instead, alternative approaches to its measurement have been proposed, such as analysis or coding of archival material or recordings of managers (e.g., Jacquart and Antonakis 2015; Jensen et al. 2023; Tur et al. 2022).

However, despite the criticism of the reliance on questionnaires in leadership research (Banks et al. 2023), the use of questionnaires is indispensable in many fields of business research where charisma matters, most notably research in strategic management where, for example, executives are valuable informants (e.g., Kiss et al. 2022; Weller et al. 2020). Thus, the questionnaire method remains an integral part of business research; however, an instrument to measure charismatic leadership unbiasedly is still missing.

This is what I aim to achieve in this work: to develop a questionnaire that measures charismatic leadership based on the charismatic behaviors of managers. To do so, I undertook a series of ten distinct studies, arrayed into five steps, wherein I developed and validated an exogenous scale to measure leaders’ charisma, the Charismatic Leadership Tactics Scale (CLTS). Its nine items describe specific and concrete leader behaviors, and their development was inspired by the conceptualization of leaders’ charisma as an “values-based, symbolic, and emotion-laden leader signaling” (Antonakis et al. 2016) and based on firm evidence for charismatic leader behaviors, that constitute that signaling (e.g., Bono and Ilies 2006; Maran et al. 2019). In the first step, I prove the factorial structure of the scale and its psychometric quality criteria. In the second step, I show the scale’s convergent, incremental, and discriminant validity and the self- and other-report agreement between managers and their direct subordinates. In the third step, I show the external or criterion-related validity of the scale and demonstrate that the scale is indeed related to objective measures of the behaviors asked by its items. Step four demonstrates the change sensitivity of the scale applied in a training program for managers and entrepreneurs. Finally, in step five, the questionnaire is translated into another language and tested with employees and managers. Over several steps, I also show that the scale predicts relevant leadership outcomes, such as effectiveness, equally or better than established, widely used measurement instruments such as the transformational leadership scale from the Multifactor Leadership Questionnaire (MLQ, Bass and Avolio 1995; MLQ 5X-Short, Avolio and Bass 2004) or the Conger-Kanungo scale of charismatic leadership (CKS; Conger and Kanungo 1994, 1998).

By offering a brief scale to measure a leader's charisma, I contribute to business research in three meaningful ways. First, the CLTS is the first to measure charisma, building on the recently established signaling approach to leader charisma (Antonakis et al. 2016). The signaling approach to charisma defines charisma as the sum of behavioral signals emanating from the leader. This conceptualization solves the endogeneity problem of existing questionnaires, yet no questionnaire has made leaders’ charisma measurable following this conceptualization. Second, because the scale's items ask about specific leader behaviors (e.g., Van Quaquebeke and Felps 2018), the questionnaire avoids recursively relating the items to the leadership outcomes that the questionnaire intends to predict (e.g., "Has a strong presence," MLQ). Third, because the items in the questionnaire ask about behaviors that which are neither positive nor negative by themselves, it prevents any conflation with the outcomes of managers' leadership (Alvesson and Einola 2019). For example, when I ask managers or their employees whether the manager is "a good leader" (General Charisma Inventory, GCI; Tskhay et al. 2018), the judgments are likely to be conflated with the actual outcomes of the managers' leadership that were intended to measure. Managers who deliver better results might thereby be perceived and judged by their followers as "entrepreneurial" and "inspiring" (CKS), "with optimism" and "full of confidence" (MLQ), or simply as "good leaders'' (GCI). In contrast, the specification of a manager's behaviors is likely to be conceptually and in terms of their valence independent of the leadership outcomes to be predicted.

2 Conceptualization: charisma as a signaling process

Charisma matters for managers. For example, leaders' charisma shapes the recommendations and forecasting of their firms' future performance by securities analysts (Fanelli et al. 2009), trigger more favorable coverage of their quarterly earning calls by journalists (König et al. 2018, but see Fiset et al. 2021), boost their informal leadership in social media (Tur et al. 2022), amplify their brands leadership in the marketplace amid consumers (Wieser et al. 2021), tilt the scales in election results of national significance (Jacquart and Antonakis 2015), and finally it gives policymakers a way to create compliance with even very far-reaching policies among the public in crises (Covid-19 pandemic; Jensen et al. 2023). It is this charismatic aura, according to a prominent meta-analysis of 76 empirical studies, that can help boost leaders to outstanding success at all organizational levels, from the individual to the whole organization (Banks et al. 2017). Astonishingly, the effect of a leader's charisma can even hold a candle to the single best-proven management practice for pushing employee performance: pay for performance (Jenkins et al. 1998; Merchant et al. 2018). Experimental evidence shows that a leader's charisma produces similar performance gains as financial rewards, yet with zero production costs (e.g., Antonakis et al. 2022).

Despite these compelling findings, the concept of charismatic leadership is facing fierce headwinds: it is supposed to be poorly conceptualized (van Knippenberg and Sitkin 2013; Yukl 1999), to remains a "big man theory," and the social constructivist corner identifies it as a tool for "masculinist agency" (e.g., Joosse and Willey 2020). Nevertheless, in one stroke resolving all of these actual and putative shortcomings, Antonakis et al. (2016) introduced a re-conceptualization of leaders’ charisma by embedding—simply but elegantly—charismatic leadership into the elaborated proposition of signaling theory (Spence 2002) and conceptualizing it as "value-based, symbolic and emotion-laden leader signaling" (Antonakis et al. 2016, p. 304). Put simply, this approach assumes that charismatic signaling gives aspiring candidates for group leadership an advantage in gaining followership. Once a coordination problem arises in a group, leadership is an adaptive solution to it, and that is where charisma gains its prominence (Grabo et al. 2017; Spisak et al. 2015). When candidates compete for the role of the group's leader, their signaling provides followers with cues about the candidate's leadership ability. Speeches, metaphors, gesturing, or simply eye contact act as honest signals that provide followers with reliable clues about the manager's cognitive sophistication, dedication, and other qualities that are critical to solving the challenge the group faces (Mio et al. 2005; Maran et al. 2019; Silvia and Beaty 2012; von Hippel et al. 2016). The sum of these signals makes up the charismatic appeal of leadership candidates, with them gaining the favor of followers to emerge triumphant in the competition for leadership. These qualities, then, enable charismatic leaders to deliver better results for the group.

This approach focuses on the observable behaviors of leaders, paving the way for measuring leaders' charisma at a behavioral level without conflating it with their effect on the leaders' audience. By mapping charisma onto measurable behaviors, this conceptualization turns the distal construct of leaders' charisma right side up and positions it on firm, behavioral underpinnings. However, a questionnaire instrument to measure the charisma of managers in such a behavior-oriented way is still missing. This is the aim of the present study. I develop a short and psychometrically robust scale to measure charismatic behaviors in leaders. Therefore, in the first step, I identify distinct, empirically proven behavioral signals that constitute leaders' charisma and might be considered for use as items for a behaviorally conceptualized questionnaire to measure it.

3 Item development: What signals constitute a leader’s charisma?

Building on the signaling perspective on leaders' charisma, I will develop a questionnaire measure that avoids the classical pitfalls of endogeneity and a range of rater biases. To do so, I will design items that, due to their specific mapping onto behavioral signals of charisma, are far less affected by the perceptual and conceptual biases stemming from either raters’ inference or from the fundamental overlap between measure and the measure’s outcome plaguing current questionnaires (Antonakis et al. 2016; Fischer and Sitkin 2023; van Knippenberg and Sitkin 2013; Yukl 1999), offering future research a robust measure of leaders’ charisma. The simplest solution to develop such a psychometric tool is to create a list of behaviors that have been robustly linked to leaders’ charisma in previous research, that simultaneously serve an empirically proven or theoretically derivable signaling function, and that, therefore, allow us to assess charisma as an exogenous variable. For example, the new scale shall ask raters to evaluate to what extent the rated manager "uses a metaphorical language" or “tells stories to make a point”; (Antonakis et al. 2016, p. 309; Wang & Seibert 2015). This contrasts with established questionnaires, which aim to measure derivative outcomes of charismatic behaviors, or at best, impressions of charisma itself, with all the corresponding repercussions (Fischer and Sitkin 2023).

To develop the items, I first reviewed these behavioral components of leader charisma in Table 1, described the supporting evidence for their charismatic effect, hypothesized their signaling function, and used them to develop items for the scale. From this selective overview of findings on tactics in leader communication, I identified nine charismatic tactics that can be consistently described as signals of a leader's charisma (Grabo et al. 2017). The number of no more than nine items is intended to ensure the optimum per-item validity (Soto and John 2019). Critical criteria for selecting the behaviors were: first, the behavior must be distinctively observable; second, the behavior must be communicative; and third, the behavior must serve a signaling function, providing benefits to both sender and receiver. Notably, the signaling function of behavior arises from the fact that it needs to reveal honest information about the leadership ability of the managers who sent the signal, and therefore, not be arbitrary, but rather costly to produce so that it cannot be shown by everyone, but only those managers in possession of that leadership ability (Antonakis et al. 2016; Grabo et al. 2017).

Table 1 Overview of findings on different charismatic leadership tactics with classification into verbal and nonverbal domains, supporting evidence for the beneficial effect of signals for managers, hypothesized signaling function, which gives advantages to sender (managers) and receiver (employees and others) and therefore makes followers act upon it, and development of items for the Charismatic Leadership Tactics Scale (CLTS)

For example, one item measures the use of metaphors ("uses a metaphorical language"). Using image-based words is a key aspect of a leader's success, resulting in ascriptions of charisma and greatness (e.g., Naidoo and Lord 2008; Emrich et al. 2001). Employees should also remember whether their manager is more likely to say "Try hard!" to them or more often uses pictorial phrases like "Put your hearts into it!". It is high intelligence that enables the production of pictorial language. Therefore, it is hard to fake, so using metaphors is an honest signal of cognitive sophistication (Silvia and Beaty 2012; Beaty and Silvia 2013). Higher cognitive ability is a critical predictor of leader effectiveness (Judge et al. 2004; Antonakis et al. 2017; Antonakis et al. 2022) and thus signals a valuable attribute that lends credibility to an aspirants’ possession of leadership abilities. Seen also through the lens of biology, it fulfills the function of a signal because it is honest, costly to produce, and gives an advantage to both the sender and receiver (e.g., Higham 2014). It tells employees about a characteristic of managers that is key to the success of their leadership, i.e., intelligence (e.g., Antonakis et al. 2022), which in turn earns the manager their approval, turning an audience into followers and allowing lead to lead more effectively (e.g., Emrich et al. 2001). The use of metaphors can, therefore, be classified as a signal of leaders' charisma.

Put concisely, all behaviors queried via the items in an easily comprehensible form were selected according to these criteria. Table 1 lists each of these charismatic signals and describes (1) a narrative summary of the evidence on the effects of these leader behaviors on recipients, (2) assumptions and supporting evidence on the signaling function of the leader behaviors, and (3) the wording of the items for the scale being tested.

Building on this signaling approach to leader charisma, I circumvent the conceptual pitfalls inherent to charisma conceptualizations behind existing questionnaire measures (van Knippenberg and Sitkin 2013; Yukl 1999). By inquiring about the frequency with which a manager uses charismatic signals that are specific and neutral to themselves, I should also be spared the biases suffered by the very questionnaires whose item formulations query successful outcomes of the leadership process rather than the constituent behaviors (Alvesson and Einola 2019; Antonakis et al. 2016). However, as a trial by fire for such a scale of charismatic tactics, the next step is to subject it to rigorous testing for its psychometric quality and vigor in practical settings in order to prove worthy of effectively measuring one of the most relevant constructs in leadership science: leaders’ charisma (Weber 1982).

4 Overview of studies

Across five steps and ten studies, I put the instrument to the litmus test, following as rigorously as possible the gold standard for scale development in leadership science and beyond (see Table 2; Clark and Watson 2019; Crawford and Kelder 2019; Wright et al. 2017). In this case, a rigorous scale development is more important as it builds on a conceptualization of leaders' charisma that was born out of the criticism of the use of questionnaires (Antonakis et al. 2016) and coincides with a time when the call for a stronger focus on concrete leader behaviors in leadership research is becoming strong (Banks et al. 2023). Therefore, beyond this scale’s classic psychometric trial by fire, special attention will be paid to its external validity, or more precisely, whether the scale really measures the charismatic leader behaviors it claims to assess.

Table 2 Overview of the five steps and ten studies used to validate the Charismatic Leadership Tactics Scale (CLTS)

In five steps, I tested the overall psychometric quality of the CLTS with 681 managers, 625 employees, and 330 additional study participants (see Table 2). In step 1, I tested the questionnaire’s factorial structure and psychometric properties in managers (self-report; study 1) and employees (observer-report; study 2). In step 2, I replicated the factorial structure of the instrument for managers and their teams within a multi-level design; I tested for self-other agreement, as well as incremental validity in predicting the extra-performances by the team members being led against other instruments for measuring charisma (study 3). In the same step, I tested the convergent and discriminant validity of the questionnaire in another sample of managers against scales from established questionnaires on leader behaviors and leadership styles (Conger-Kanungo charismatic leadership questionnaire, Conger and Kanungo 1994, 1998; Multifactor Leadership Questionnaire 5X-short, Avolio and Bass 2004; Managerial practices survey, Yukl 1990; Leadership Behavior Description Questionnaire, Stogdill et al. 1962; Managerial Behavior Instrument, Lawrence et al. 2009).

One of the most important aims is to test criterion-related validity to answer whether the scale measures these charismatic tactics and, equally important, whether it fits into a network of critical variables related to leaders' charismatic signaling as defined by its conceptualization. That is what I do in step 3. First, I test whether charismatic tactics, as seen in video recordings of political leaders by observers and rated using the CLTS, correspond to leaders' behaviors as measured by automated software and manually coded by independent experts (study 5). Since signals of charismatic leadership are supposed to be honest, thus, to provide information about valuable leadership abilities, and at the same time to be costly to produce, I link the scale to cognitive abilities that support the production of these very charismatic signals (study 6). Leadership is influence, which is directly expressed in persuading others, so I employed a negotiation paradigm (Pinkley et al. 1994) to test whether the scale is related to negotiation success (study 7).

In step 4, I test whether the scale is sensitive to changes in charismatic tactics by utilizing it with peers and followers of leaders who participated either in charismatic leadership training or a control intervention (study 8). Last, in step 5, I replicate the scale’s psychometric properties, factorial structure, and aspects of convergent validity in a foreign language in both managers (self-report; study 9) and employees (observer-report; study 10).

5 Step 1: Testing the factorial structure and psychometric properties

5.1 Study 1: Leaders’ self-rated charismatic leadership tactics

The first study assessed the factorial structure and psychometric properties of the proposed charismatic leadership tactics scale in managers of private, small and medium-sized firms (< 250 employees) in Germany, Austria, Switzerland, and Liechtenstein. The sample consisted of 141 managers and executives (17.8% female), their ages ranging from 19 to 66 years, Mage = 44.24, SD = 12.25 (in three larger companies from the consulting services sector, it was not permitted to capture data on age and gender), over 87% of which came from the construction industry, financial services, and consulting services. Participants had a median leadership experience of 10 to 15 years (see Supplementary Information for further details).

The participating managers completed the Charismatic Leadership Tactics Scale (CLTS), together with questions on socio-demographics, management experience, and the characteristics of their firm. They provided ratings on the frequency with which they typically employ the nine charismatic leadership tactics included in the CLTS on a 5-point Likert scale (1 = almost never, 5 = almost always; see Table 1; see Supplementary Information for further details and the scale instruction).

I first employed a maximum likelihood exploratory factor analysis (EFA) with a Promax rotation to assess the factorial structure of the CLTS. The Kaiser–Meyer–Olkin (KMO) measure for the adequacy of the sample for factor analyses was sufficient at 0.767 (Kaiser 1970), and the Bartlett-test showed the required significance (Bartlett 1950) at χ2(36) = 286.60, df = 36, p < 0.001. The Kaiser-Guttmann criterion indicated two factors with Eigenvalues at 3.24 and 1.24, while Scree Plot and Parallel Analysis (Hayton et al. 2004; Lim and Jahng 2019) supported a single factor. A two-factor solution showed the first factor to explain 27.19% and the second to explain 11.36% of the variance, with both factors correlating at r = 0.47. However, the three items loading on the first factor could not adequately be interpreted, and two items of the scale did not adequately load on any factor. An alternative single factor, on the other hand, explained 28.79% of the variance. Factor loadings on the single factor were at 0.79 for gestures, 0.70 for facial expressions, 0.61 for metaphorical language, 0.51 for storytelling, 0.47 for vision, 0.42 for smiling, 0.40 for rhetorical questions, 0.39 for focused gaze, and 0.38 for contrasts.

To further compare these two possible solutions and to facilitate a decision on which factorial structure to retain, I conducted confirmatory factor analyses (CFA; Hu and Bentler 1999) on the single-factor and two-factor solutions. I also proposed a theory-driven competing two-factor model by separating the verbal and nonverbal tactics as distinct latent variables (see Table 1). I calculated the model fit using maximum likelihood estimates in SPSS AMOS (Version 26). As descriptive measures for the overall model fit, I report χ2/df (sufficient fit ≤ 3; good fit ≤ 2), RMSEA (sufficient fit ≤ 0.08, good fit ≤ 0.05), and SRMR (sufficient fit ≤ 0.10, good fit ≤ 0.05). CFI and TLI (sufficient fit ≥ 0.95, good fit ≥ 0.97) measure increased model fit compared to the independence model (Browne and Cudeck 1993; Hu and Bentler 1999). To allow for a direct comparison between the competing models, I report the Bayesian Information Criterion (BIC) and the Consistent Akaike Information Criterion (CAIC) as goodness of fit measures (Nylund et al. 2007; Preacher and Merkle 2012). Lower values of both measures indicate an increased fit, and a difference in the BIC value of at least 10 indicates a significant increment in data fit (Rafferty 1995). I report a chi-square difference test between the reported and the best-fit models to compare the competing models further. If suggested by the modification indices, I allowed for covariance between error terms within their respective latent factors.

The single factor showed a good and overall best fit with the data (χ2(24) = 18.28, p = 0.789, χ2/df = 0.762; CFI = 1.000; TLI = 1.033; RMSEA < 0.001; SRMR = 0.043; BIC = 122.201; CAIC = 143.201). The theory-driven model differentiating between items regarding verbal and nonverbal charismatic leader tactics showed a substantially worse (χ2(26) = 44.48, p = 0.013, χ2/df = 1.711; ∆χ2(2) = 26.20, p < 0.001; CFI = 0.929; TLI = 0.901; RMSEA = 0.071; SRMR = 0.062; BIC = 138.501; ΔBIC > 10; CAIC = 157.501) and the two-factor solution proposed by the EFA the worst fit (χ2(26) = 44.63, p = 0.006, χ2/df = 1.832; ∆χ2(2) = 29.35, p < 0.001; CFI = 0.916; TLI = 0.884; RMSEA = 0.077; SRMR = 0.063; BIC = 141.652; ΔBIC > 10; CAIC = 160.652).

Lastly, based on this compelling support for a single-factor solution, I computed McDonald’s Omega to assess the internal consistency of the measure (Cortina et al. et al. 2020; Wulff et al. 2023) using the OMEGA macro for SPSS (Hayes and Coutts 2020) at ω = 0.77 and composite reliability based on the CFA at 0.77 for the factor, both indicating good reliability. These findings from EFA and CFA regarding the factor structure of the CLTS suggest that a unifactorial structure best represents the items, thus indicating that the CLTS should be utilized as a unidimensional scale.

5.2 Study 2: Followers’ perceptions of managers’ charismatic leadership tactics

In Study 2, I further extend the findings on the factorial structure and the psychometric properties of the CLTS by acquiring a sample of employees in the same target group of firms as in Study 1 (see Supplementary Information for more details on the recruitment procedure). The sample consisted of 248 followers (42.2% female) from the same organizations as the leaders in Study 1. Their age ranged from 19 to 58 years, Mage = 35.78, SD = 11.43.

I conducted the same data analysis procedures detailed in Study 1. The Kaiser–Meyer–Olkin measure of 0.80 indicated an adequate sample, and the Bartlett test remained significant. The EFA was again ambiguous regarding the factorial structure, in this instance with the Kaiser-Guttmann criterion and the Parallel Analysis supporting a two-factor solution with Eigenvalues at 3.07 and 1.21 and explained variance at 27.43% and 5.98%, the Scree Plot, on the other hand, again favored a single factor, explaining 26.69% of the variance. The two factors correlate at r = 0.61 and comprise five and four items, respectively. Only three items of the first factor corresponded with the first factor and two items of the second factor corresponded with the second factor as suggested by the EFA in Study 1. Factor loadings for the single factor were at 0.75 for facial expressions, 0.69 for gestures, 0.58 for metaphorical language, 0.48 for storytelling, 0.46 for vision, 0.44 for focused gaze, 0.41 for rhetorical questions, 0.37 for contrasts, and 0.27 for smiling. The low value for smiling could be because although this tactic is an established component of charismatic leadership, it is not specific, i.e., exclusive to charismatic managers, but may also be observed in less charismatic managers.

I again resorted to a CFA to clarify things and gain a conclusive understanding of the factorial structure of the data. Significant differences in the chi-square value and the BIC difference exceeding 10 indicate a substantially better fit of the model with the respective lower values. Again, I proposed three competing models consisting of a single factor, two factors based on the EFA, or a distinct factor for the verbal and nonverbal tactics. Both the one-factorial (χ2(24) = 35.49, p = 0.061, χ2/df = 1.479; CFI = 0.970; TLI = 0.955; RMSEA = 0.044; SRMR = 0.046; BIC = 151.274; CAIC = 172.274) as well as the two-factorial model based on the EFA showed a similar and overall good fit with the data (χ2(25) = 33.62, p = 0.116, χ2/df = 1.345; ∆χ2(1) = 1.87, p = 0.171; CFI = 0.977; TLI = 0.968; RMSEA = 0.037; SRMR = 0.043; BIC = 143.890; ΔBIC < 10; CAIC = 163.890). The model differentiating verbal and nonverbal tactics, however, fit the data marginally less well (χ2(25) = 38.78, p = 0.039, χ2/df = 1.551; ∆χ2(1) = 3.29, p = 0.070; CFI = 0.964; TLI = 0.948; RMSEA = 0.047; SRMR = 0.048; BIC = 149.050; ΔBIC < 10; CAIC = 169.050).

The results of the CFA show no discernible difference between the single and the two-factor model proposed by the EFA. However, as these two factors do not entail a common interpretable theme, I decided to stick with the single-factor solution supported by Study 1 and the Scree Plot. This solution yielded a sufficient McDonald’s ω = 0.75 and a composite reliability of 0.75.

6 Step 2: Proving self-other agreement, convergent, discriminant and incremental validity

6.1 Study 3: Leader–follower agreement on perceptions of charismatic leadership tactics and incremental validity against existing measures

In this stage, the aim was to investigate whether leaders' self-reported perception of their use of charismatic tactics aligns with the reported perception of their subordinates as observers within a typical multilevel design prevalent in leadership research. Furthermore, its convergent and incremental validity compared to an established effect-centric measure of charismatic leadership, the MLQ 5X Short (Avolio and Bass 2004), was tested, with followers’ extra effort as an outcome measure. Data from 72 leaders (29.1% female) aged between 22 and 60 years, Mage = 44.67, SD = 10.64, were analyzed. Leaders had a mean leadership experience of 13.45 years (SD = 9.81) and led a mean of 15.33 followers (SD = 28.95). Each of the leaders was rated by two of their direct subordinates (51.2% female), resulting in a total sample of 144 followers. Participants were mainly employed at organizations in the health (11.1%), technology (9.7%), and construction (6.9%) sectors (see Supplementary Information for more details on the recruitment procedure).

In addition to general information about the person and the firm, the entire survey for the participants consisted of the CLTS, a selection of items from the MLQ 5X-Short that capture the charismatic effect of leaders, and a further selection of items that capture extra effort (all rated on a 5-point Likert scale, 1 = strongly disagree, 5 = strongly agree). Leader- (McDonald’s Omega, ω = 0.77) and follower ratings (McDonald’s Omega, ω = 0.71) of the charismatic tactics exhibited by each leader were measured using the CLTS. Based on the approach of Towler (2003), 12 items of the transformational leadership scale were selected to specifically measure leaders’ charismatic effect on followers (MLQ 5X-Short; Avolio and Bass 2004; German translation by Felfe 2006; McDonald’s Omega at ω = 0.82 for the self-, and ω = 0.83 for the follower-ratings). Last, the extra effort of followers in their unit was measured by a selection of four items (ω = 0.68), two each from the extra effort subscale of the MLQ 5X-Short (Avolio and Bass 2004; German translation by Felfe 2006) and the organizational citizenship behavior checklist (OCB-C-10; Spector et al. 2010).

Like in Study 2, I initially conducted an EFA and CFA, corroborating the previously identified factor structure of the CLTS. Furthermore, based on an evaluation of the BIC and CAIC values, an additional CFA demonstrated the independence of the CLTS from the effect-centric measure based on the MLQ 5X-Short (see Supplementary Information for these findings). To assess the agreement between leaders’ self-ratings, their followers’ observer ratings, and the convergent validity of the CLTS, I further calculated Pearson product-moment correlation coefficients (see Supplementary Table 1). I report correlations as r [± 0.10 = small effect; ± 0.30 = medium effect; ± 0.50 = large effect]. Leaders' self-ratings corresponded moderately to highly with their followers’ ratings of their charismatic leadership tactics (r = 0.49, p < 0.001; see Supplementary Table 1 and Fig. 1), indicating a substantial self-other agreement. Further, self-ratings of the charismatic leadership tactics were strongly related to leaders’ charismatic effect (r = 0.61, p < 0.001) and moderately to follower-rated extra effort (r = 0.42, p < 0.001). Follower ratings on the CLTS also related to follower ratings of leaders’ charisma (r = 0.47, p < 0.001) and again to follower ratings of extra effort (r = 0.39, p = 0.001). These findings confirm the self-other agreement and the convergent validity of the CLTS.

Fig. 1
figure 1

The figures show the associations of CLTS with relevant other variables across all studies: a, b, and c for study 3; d, e, and f for study 5; g, g, and i for study 6; j, k, and l for study 7; and m for study 9 and n for study 10

Lastly, to assess whether the new measure provides incremental validity compared to the MLQ 5X-Short, a linear regression model was performed, firstly including the selection of items assessing leaders’ charismatic effect and secondly, the CLTS as predictors for the followers’ extra effort rated by themselves or their leaders. Self-rated leaders’ charisma explained variance in in their followers’ extra effort rated (β = 0.40, R2 = 0.16, F(1,70) = 13.55, p < 0.001), yet the addition of the CLTS to the model did increase the amount of variance explained (∆R2 = 0.05, ∆F(1,69) = 4.41, p = 0.039). The measure (β = 0.28, p = 0.039) surpassed the MLQ selection, reducing its weight to non-significance (β = 0.23, p = 0.092). When employing follower ratings as the source for all three variables, leaders’ charisma again predicted the follower-reported extra effort (β = 0.40, R2 = 0.15, F(1,70) = 13.06, p = 0.001). The inclusion of the CLTS (β = 0.27, p = 0.030), however, could again explain further variance beyond the MLQ (β = 0.27, p = 0.028), thus indicating the incremental validity of the measure (∆R2 = 0.06, ∆F(2,69) = 4.89, p = 0.030).

This multilevel study thus achieved multiple goals. Firstly, the findings again indicate a single factor that best describes leaders’ charismatic behaviors. Secondly, results support the independence of the CLTS from the well-established but outcome- or effect-centric measure of leaders’ charisma based on the MLQ 5X-Short (Avolio and Bass 2004; German translation by Felfe 2006). Thirdly, I could confirm leaders’ self- and their followers’ observer ratings to correspond on a moderate to high level with each other, therefore indicating a strong self-other-agreement of leaders’ and followers’ perception of charismatic leadership tactics measured by the CLTS.

Lastly, findings support the scales’ convergent and criterion-related validity by showing a relationship to an established measure of leaders’ charisma and, most importantly, followers’ extra effort as an essential outcome of charismatic leadership. Findings on the incremental validity of leaders’ self-ratings on the CLTS compared to the item selection from the MLQ 5X-Short indicated that the CLTS largely shared variance in explaining followers’ extra effort with the established measure of charismatic leadership. However, when relating follower ratings of charismatic leadership to their extra effort, the CLTS explained unique variance beyond the effect-centric measure, thus substantiating its incremental value.

6.2 Study 4: Convergent and divergent validity with existing leadership questionnaires

Study 4 compares the CLTS to established scales assessing elements of charismatic leadership to account for its convergent and divergent validity. Furthermore, I aimed to generate insights into its relation to diverging or unrelated leadership behaviors to assess the scale’s discriminant validity.

Using the same approach as before, 160 leaders (30.6% female, Mage = 37.34, SD = 11.66, range 19–66), particularly from the financial, technology, and manufacturing industries, participated in this study (see Supplementary information for further details). In the survey, leaders’ self-ratings of their utilization of charismatic leadership tactics (ω = 0.82) were collected as detailed in previous studies. Additionally, to test the scales’ convergent and discriminant validity, further established measures assessing charisma-related and general leader behaviors, and unrelated and ineffective leader behaviors were included in the survey (all items were rated on a 7-point Likert scale).

To measure charismatic leadership and charisma-related leader behaviors, I employed the strategic vision and articulation (7 items; ω = 0.88), sensitivity to the environment (4 items; ω = 0.82) and member needs (3 items; ω = 0.67), unconventional behaviors (3 items; ω = 0.53), and personal risk-taking (3 items; ω = 0.73) subscales from the Conger-Kanungo Charismatic Leadership Scale (Conger et al. 1997). Second, the idealized influence attributed (ω = 0.81) and behavior (ω = 0.86), and inspirational motivation (ω = 0.85) subscales from the MLQ 5X Short (4 items each; Avolio & Bass, German translation by Felfe 2006). Third, the envisioning subscale (4 items; ω = 0.82) from the Managerial Practices Survey (MPS; Yukl 2012). Fourth, persuasion (10 items; ω = 0.90) from the Leadership Behavior Description Questionnaire (LBDQ XII; Stogdill et al. 1962). Fifth, the ability to inspire people to exceed expectations (3 items; ω = 0.78) from the Managerial Behavior Instrument (Lawrence et al. 2009).

As measures for variables distinct from charismatic leadership but still exemplifying effective leadership tactics, I included individual consideration (ω = 0.87), intellectual stimulation (ω = 0.84), contingent reward (ω = 0.86), and active management by exception (ω = 0.67) from the MLQ 5X-Short (4 items each). These were extended by the monitoring operations (ω = 0.80), clarifying (ω = 0.87), and planning activities (ω = 0.74) subscales from the MPS (4 items each), as well as the production emphasis aspect of the LBDQ (10 items; ω = 0.80). Lastly, I included the passive management by exception (ω = 0.81) and laissez-faire (ω = 0.79) subscales from the MLQ 5X Short (4 items each) as variables that should not or even negatively be related to charismatic leadership.

I computed Pearson’s product-moment-correlation coefficients between the CLTS and the established constructs assessing charismatic leadership or essential aspects related to it. Results showed that the CLTS was well related to the established charisma-related instruments (mean rOlkin & Pratt = 0.63; see Supplementary Table 2). In more detail, it corresponded highly with the Conger-Kanungo subscales personal risk (r = 0.36, p < 0.001), sensitivity to members’ needs (r = 0.54, p < 0.001), sensitivity to the environment (r = 0.54, p < 0.001), strategic vision and articulation (r = 0.79, p < 0.001), and unconventional behavior (r = 0.63, p < 0.001), as well as with the MLQ subscales idealized influence attributed (r = 0.63, p < 0.001), idealized influence behavior (r = 0.64, p < 0.001), and inspirational motivation (r = 0.70, p < 0.001). Furthermore, the persuasion dimension of the LBDQ (r = 0.70, p < 0.001), the envisioning dimension of the MPS (r = 0.73, p < 0.001), and the “inspiring people to exceed expectations” aspect of the MBI (r = 0.67, p < 0.001) were highly correlated with the new measure. I found slightly lower correlations (mean rOlkin & Pratt = 0.56) for the charisma-adjacent scales of individual consideration (r = 0.57, p < 0.001), intellectual stimulation (r = 0.61, p < 0.001), contingent reward (r = 0.59, p < 0.001), management by exception active (r = 0.35, p < 0.001), management behaviors of production emphasis (r = 0.57, p < 0.001), monitoring operations (r = 0.61, p < 0.001), clarifying (r = 0.60, p < 0.001), and planning activities (r = 0.60, p < 0.001). These results indicate that it substantially covers all relevant aspects of charismatic leadership, thus supporting its convergent validity while not overlapping with managerial practices not directly related to charisma. Additionally, the CLTS did not correspond (mean rOlkin & Pratt = -0.01) to ratings on the MLQ subscales representing a passive leadership style: management by exception passive (r = -0.01, p = 0.928) and laissez-faire (r = -0.01, p = 0.905) which supports its discriminant validity. These findings could be replicated, even when controlling for age and gender in partial correlations (see Supplementary Information).

To summarize, the CLTS corresponded well with questionnaires following a different conceptual approach to assess charisma, while it did not bear relations with the ineffective passive-avoidant leadership behaviors. This finding supports the convergent and divergent validity of the scale.

7 Step 3: The trial by fire of construct and criterion-related validity

7.1 Study 5: Coded and automated measurement of charismatic leadership tactics

Next, I examine what can be considered the most fundamental test for the CLTS, namely whether the CLTS measures the charismatic leadership tactics it is intended to measure. Specifically, this study aims to investigate whether observers can accurately identify charismatic leadership tactics employed by these leaders using the CLTS after a single exposure to leaders. To this end, we correlate observers' responses on the CLTS with objective measurements of charismatic leadership tactics, specifically manual codings of these tactics and automated measurements of verbal and nonverbal tactics in transcripts and videos of the leaders.

To subject the CLTS to this litmus test, I devised a study that integrates several methodological approaches. Initially, video recordings of leaders were collected as stimulus material. Based on their easy accessibility and high degree of standardization, I collected speeches made by members of the US Senate and broadcasted via the television network C-SPAN as the target sample (80 politicians, 26.3% female, Mage = 60.69, SD = 10.80, range 37–85). To obtain objective data on the charismatic leadership tactics politicians used in their speeches, one recent speech of each selected politician was manually coded for these tactics by 12 trained coders. In addition, I further conducted automated text analyses of the speech transcripts (LIWC; e.g., Fanelli et al. 2009), as well as an automated analysis of gesture expressivity (open-source real-time human pose detection library, “OpenPose”, Cao et al. 2017; gesture analyses could only be computed for 76 of the speeches; see Supplementary information).

To obtain ratings of the observed charismatic leadership tactics using the CLTS and impressions of politicians’ charisma, a sample of observers was recruited to watch and evaluate a selection of videos. Speeches were randomly allocated to raters from the UK who were recruited through the platform Prolific.co (359 ratings from 274 raters, 50.4% female, Mage = 36.00, SD = 10.02, range 18–73), resulting in a mean of 4.49 ratings for each politician’s speech). Participants received monetary compensation for providing the ratings. Raters assessed perceived charismatic leadership tactics using the CLTS (ω = 0.90), as well as leaders’ charisma with the selection of items from the MLQ-5X Short employed in the previous studies (ω = 0.96; Avolio & Bass, German translation by Felfe 2006).

I report Pearson’s correlation coefficients for all correlation analyses (see Supplementary Table 3 and Fig. 1). Overall, the number of charismatic leader behaviors coded corresponded with observer ratings on the CLTS (r = 0.34, p = 0.002) and on the selection of items from the MLQ (r = 0.24, p = 0.036). Furthermore, ratings on the items describing verbal tactics were related to the sum of verbal (r = 0.25, p = 0.027) but not nonverbal (r = 0.18, p = 0.114) behaviors coded. Ratings of the nonverbal tactics reflected both the actual amount of verbal (r = 0.22, p = 0.049) and nonverbal (r = 0.37, p = 0.001) behaviors.

In more detail, ratings of the politicians employing rhetorical questions were related to their actual usage (r = 0.34, p = 0.002), frequent smiling to actual smiles (r = 0.30, p = 0.008), telling stories to convey a point to actual storytelling (r = 0.31, p = 0.006), using gestures while speaking to actual gesturing (r = 0.55, p < 0.001), the usage of metaphorical language with the frequency of storytelling (r = 0.22, p = 0.046) but only near significant levels with metaphors (r = 0.21, p = 0.062), and facial expressions related to smiles (r = 0.22, p = 0.045) and lowered eyebrows (r = 0.23, p = 0.044). By contrast, the items rating an increased employment of visions (r = 0.13, p = 0.245) and contrasts (r = 0.15, p = 0.188) did not directly reflect their coded counterpart.

Regarding the objective computerized text analysis of the speeches’ content, I found ratings of the speaker exhibiting strong facial expressions to relate to the general affectivity of the speech (r = 0.30, p = 0.007), indicating facial expressions being actively employed to substantiate the speeches' content and being recognized by the observers. Lastly, ratings of the speaker having vision corresponded to the environmental (r = 0.31, p = 0.005) and social (r = 0.27, p = 0.015), however, not to the economic (r = 0.15, p = 0.190) value orientation of the speaker. Lastly, objectively measured gesture expressivity corresponded with ratings of charismatic leadership tactics in general (r = 0.41, p < 0.001) and with ratings on the frequency of employed gestures specifically (r = 0.48, p < 0.001). Results largely matched the previous analyses in which age and gender were controlled using partial correlations (see Supplementary information).

These results indicate that the CLTS matches well with coded and objectively measured charismatic leadership tactics. This finding constitutes the strongest support for the criterion validity of the CLTS, as it demonstrates that the scale indeed measures what it intends to measure, extending even to the individual items of the scale. This is consistent with a previous finding where it was shown that even rapidly changing phenomena, such as leaders' gaze patterns, could be sufficiently measured by the observer- and self-reports of their gaze behavior (r = 0.30, p = 0.009; Maran et al. 2019, Study 2). Having established that the CLTS effectively measures leaders’ application of charismatic leadership tactics, the subsequent steps involve examining the scale's relationship with the theoretically assumed antecedents and effects of these tactics.

7.2 Study 6: Cognitive abilities as antecedents of the production of charismatic leadership tactics

Next, I further strengthen construct and criterion-related validity by examining the link of charismatic leadership tactics, as measured by the CLTS, to the abilities charisma is expected to signal and thus its role as a possible consequence of these constructs. More specifically, I measure the cognitive abilities of individuals and test whether higher cognitive abilities are associated with the production and use of more charismatic tactics. Viewed through the lens of the signaling account of charisma, charismatic tactics should be costly to produce because they can only be produced by leadership aspirants with higher leadership abilities and, therefore, provide honest information about the presence of these abilities in a candidate. Suspects for these abilities are the cognitive abilities of candidates that predict leadership effectiveness (e.g., Antonakis et al. 2022). In a nutshell, if the CLTS measures a higher propensity to use charismatic tactics, then these should be a consequence of, and therefore related to higher cognitive ability.

We tested this prediction in a sample of 174 participants (63.8% female; Mage = 22.82, SD = 2.87, range: 18–32) from Austria, Germany, Switzerland, and Liechtenstein. All of them had the aspiration to found a start-up or take up management positions in firms, which was the purpose of the network to support them in their aspirations. Participants were contacted via the network of a youth section of a business association and through personal contacts and requested to complete a questionnaire. Questionnaires were composed of the CLTS (ω = 0.66), the selection from the MLQ 5X Short to assess charismatic leadership (ω = 0.81; Avolio & Bass, German translation by Felfe 2006), as well as the General Charisma Inventory (GCI; Tskhay et al. 2018) to assess participants' charismatic influence (ω = 0.74) and affability (ω = 0.64). To validly measure participants’ cognitive abilities, To gain a broad picture of the cognitive abilities of the aspirants I further employed Raven’s Advanced Progressive Matrices (RAPM; Raven et al. 1998) to assess their fluid intelligence, the Alternative Uses Task (AUT; Guilford et al. 1960) to measure their divergent thinking ability, and the Remote Associates Test (RAT; Mednick and Mednick 1967) to assess aspirants’ convergent thinking ability (see Supplementary information).

I first computed Pearson’s product-moment-correlation coefficients to assess the relationship between individuals’ cognitive abilities, their propensity to use charismatic tactics, and their self-rated charisma. I replicated these analyses as partial correlations, controlling for possible confounding effects of sex and age. Furthermore, to examine the expected flow of cognitive abilities increasing the frequency of charismatic signaling, which ultimately should result in increased ascriptions of charisma, I proposed mediation models including the cognitive abilities as predictors of charisma self-ratings, mediated by the usage of charismatic leadership tactics. Again, to control for possible effects of sex and age, I included these variables as covariates to the models. I used the SPSS macro PROCESS v4.0 (Hayes 2022) to compute these models at 5000 bootstrapping samples. To account for the biasing effects of heteroskedasticity, I further calculated robust standard errors using the heteroskedasticity consistent estimator 3 (HC3; Davidson and MacKinnon 1993). I report standardized coefficients for the mediation analyses, and indirect effects were deemed significant if the estimate’s 95% bootstrapping confidence interval did not include zero.

Neither fluid intelligence nor convergent thinking were related to charismatic leadership tactics or endogenous measures of charisma (all p’s < 0.05, see Supplementary Table 4 and Fig. 1). However, divergent thinking was related to the CLTS (r = 0.24, p = 0.001) and to the affability dimension of general charisma (r = 0.17, p = 0.044). When it comes to the relationship between the CLTS and the outcome-centric charisma questionnaires, the CLTS corresponded with charismatic leadership (r = 0.43, p < 0.001) and both dimensions of general charisma, influence (r = 0.45, p < 0.001) and affability (r = 0.26, p = 0.001).

The mediation analyses revealed that divergent thinking abilities did indeed indirectly (γ = 0.13, SE = 0.04, 95% CI = 0.06 to 0.21) rather than directly (γ = -0.04, SE = 0.08, p = 0.577) shape charismatic leadership via the pathway of charismatic leadership tactics. I further found consistent results for the influence (direct effect: γ = 0.02, SE = 0.08, p = 0.778; indirect effect: γ = 0.12, SE = 0.04, 95% CI = 0.06 to 0.20) and affability dimension of general charisma (direct effect: γ = 0.04, SE = 0.08, p = 0.641; indirect effect: γ = 0.08, SE = 0.03, 95% CI = 0.02 to 0.15).

To summarize, the CLTS, as opposed to the outcome-centric measures of charisma, except for affability, was related to participants’ cognitive capabilities. This indicates that the charismatic leadership tactics assessed are directly related to participants’ cognitive capabilities, especially their ability to generate new and creative ideas and, therefore, act as honest signals for the senders’ characteristics. In addition, these findings reveal the entire path from higher cognitive abilities, specifically divergent thinking, to higher production and utilization of charismatic tactics to the perceived charismatic effect on others.

7.3 Study 7: Charismatic leadership tactics and performance in a face-to-face negotiation

Supposing that the CLTS does indeed assess the “leadership vitamin” charisma; in that case it should also be able to measure the expected effects of leaders’ charisma, which is an essential influence in social interactions. I therefore conclude this step by assessing how charismatic signaling, as assessed by the CLTS, relates to individuals’ influencing success in a negotiation task.

To investigate this, participants were recruited for a negotiation task in which they had to negotiate for their interests that were linked to points (New Recruit negotiation task; Pinkley et al. 1994; see Supplementary information for details). Fifty participants (76.0% female; Mage = 22.42, SD = 6.09, range 18–59) were assigned randomly to the recruiter or job candidate role in the task, resulting in 25 negotiation dyads. Throughout a 30-min negotiation, they had to decide on one of five settlement options for each issue, each rewarding different quantities of points, dependent on a role-specific payout plan disclosed to only the respective participants themselves. After the negotiation, both participants rated their use of charismatic leadership tactics on the CLTS (ω = 0.68) and their charismatic influence (ω = 0.78) and affability (ω = 0.73) on the General Charisma Inventory (GCI; Tskhay et al. 2018).

I calculated Pearson correlations between participants’ self-reported use of charismatic leadership tactics and their performance in the negotiation task. Firstly, I analyzed the influence of one individual exhibiting charismatic leadership tactics more frequently than their negotiation partner (i.e., the difference between both negotiators’ self-ratings) on their negotiation success at the detriment of their interlocutor (i.e., the difference between both negotiators’ score) and found a larger difference in charismatic signaling to be related to an increase in the deviance in points gained (r = 0.50, p = 0.011; see Supplementary Table 5 and Fig. 1). Secondly, across all participants, I found higher self-ratings of charismatic leadership tactics to be related to fewer points gained by the opposing negotiator (r = -0.34, p = 0.015), yet not with an increase in the number of points achieved by themselves (r = 0.11, p = 0.445; see Supplementary Table 5). In comparison, neither higher self-ratings on the influence nor the affability dimension of charisma were associated with neither the points gained by oneself (influence: r = 0.22, p = 0.134; affability: r = -0.07, p = 0.616) nor the other participant (influence: r = -0.13, p = 0.374; affability: r = -0.10, p = 0.512; see Supplementary information for analyses that control for age and gender).

To conclude, more frequent charismatic leadership tactics were related to fewer points achieved by the respective negotiation partner. When focusing on the interaction between the negotiators, I found an increased disparity in points achieved for negotiators with a larger difference in their tendency to engage in charismatic leadership tactics. Following up on the previous study, these findings further posit the charismatic leadership tactics assessed by the CLTS to predict social influence and, therefore, the ability to get ahead in negotiations.

8 Step 4: Proving sensitivity to change

8.1 Study 8: Training managers and entrepreneurs in charismatic leadership tactics and measuring behavior change

In this fourth step, I aimed to assess the scale’s sensitivity towards changing charismatic leadership tactics. This allowed me to investigate whether charismatic leader behaviors are memorable and distinctly observable or rather prone to evaluation biases. By systematically varying leaders’ use of charismatic leadership tactics through a training, observer ratings from the trainees’ followers and peers should reflect the degree of observability and memorability of the behaviors. I, therefore, designed a multi-session intervention program teaching managers and entrepreneurs to implement verbal and nonverbal charismatic leader behaviors in their speeches and everyday communication. This evidence-based training employed an action learning approach (e.g., Frese et al. 2003) encompassing both instructor input and peer exercises. To account for possible Hawthorne effects, I furthermore ran an active control group that did not acquire any training or information on charismatic communication techniques but instead participated in a general course on leadership, following a similar teaching approach including both lecturer input and action learning, but without giving instructions on charismatic tactics.

The sample consisted of 50 managers who participated in an MBA program and entrepreneurs who participated in a training program at the university and were split up in two equal groups of 25 people, the intervention group (8.0% female; Mage = 27.52, SD = 4.87, range: 19–38) and the control group (44.0% female; Mage = 20.80, SD = 1.04, range: 19–23). The allocation was not randomized but naturalistic, with the groups being trained one after the other.

All participants were asked to answer self-rating questionnaires before and after the intervention or control setting and to gather peer ratings of acquaintances they regularly worked with (co-founders, peers, or subordinates). To assess the CLTS’s sensitivity to change, self- (pre-intervention: ω = 0.73; post-intervention: ω = 0.77), as well as peer-ratings (observer version of the CLTS; pre: ω = 0.78; post: ω = 0.83), were collected before and after the subjects partook in the charisma intervention or control treatment. Furthermore, I compared the performance of the CLTS to self- (pre: ω = 0.77; post: ω = 0.83) and peer-ratings (pre: ω = 0.88; post: ω = 0.83) on the selection of items measuring leaders’ charisma from the MLQ 5X Short.

To assess the scale’s sensitivity to changes in the rated individuals’ charismatic signaling, I computed analyses of variance for repeated-measures designs, including the participation in the intervention or control group as a between-subject factor, as well as pairwise comparisons and pairwise t-tests to gain further insights into main and interaction effects. Lastly, I report point-biserial correlation coefficients between the intervention/control condition and all collected charisma variables (see Supplementary Table 6). I standardized all data before conducting the analyses.

I found self-ratings on the CLTS to be substantially higher after the intervention as compared to before (MD = -0.25, F(1,48) = 18.36, p < 0.001, ηp2 = 0.28), with both ratings in the intervention- (MD = -0.71, t(24) = -3.63, p = 0.001) and the control-group (MD = -0.27, t(24) = -2.28, p = 0.032) increasing, thus indicating a possible Hawthorne effect of participating in the study that led to an insignificant interaction effect (F(1,48) = 3.62, p = 0.063). For the follower and peer ratings, I also found a difference between the ratings before and after the intervention (MD = -0.39; F(1,48) = 10.82, p = 0.002, ηp2 = 0.18), which was mainly attributable to the ratings for the participants of the intervention (MD = -0.70, t(24) = -3.90, p = 0.001), not for the control group (MD = -0.07, t(24) = -0.48, p = 0.637). This was further reflected in a clear interaction effect between the rated individual being part of the training and the time of data acquisition (F(1,48) = 7.15, p = 0.010, ηp2 = 0.13; Fig. 2), indicating that followers and peers detected changes in the trainees’ charismatic leadership tactics.

Fig. 2
figure 2

The figures show the effects of the charismatic leadership training. They depict the changes before and after the intervention for the control group (gray boxes) and the intervention group (green circles), recorded for the managers and entrepreneurs themselves (a, b), as well as for their peers or followers (c, d) (color figure online)

Similarly, the self-ratings of leaders’ charisma increased between the two data acquisitions before and after the intervention or control treatment (MD = -0.39; F(1,48) = 8.87, p = 0.005, ηp2 = 0.16). This increase was more pronounced in the intervention (MD = -0.50, t(24) = -2.26, p = 0.033) than in the control group (MD = -0.27, t(24) = -2.02, p = 0.055), yet changed in a similar pattern, thus not causing an interaction effect (F(1,48) = 0.79, p = 0.379; see Fig. 2). When it comes to the follower and peer ratings on this measure, I found no main effect (MD = -0.11; F(1,48) = 0.75, p = 0.391) but again, an interaction of the time of rating and the subjects’ participation in the intervention or control group (F(1,48) = 6.27, p = 0.016, ηp2 = 0.12). As before, the follower and peer ratings increased for participants of the intervention (MD = -0.42, t(24) = -2.26, p = 0.033), yet for the control group, they remained mostly consistent, even showing a slight downward trend (MD = 0.21, t(24) = 1.23, p = 0.230).

Firstly, these findings confirm the CLTS to be sensitive to changes in leaders’ use of charismatic tactics. Peers could observe and remember changes in the charismatic leadership tactics displayed by the active and prospective leaders participating in the study. They could accurately detect an increase in such behaviors in trained participants, while no changes in ratings occurred for the control group. Secondly, the effects of the charisma training were inferable by the behavior-oriented CLTS and the outcome-centric measure of leaders’ charisma. Having an exogenous measure that provides the same results as the established endogenous measure further establishes its value for future research.

9 Step 5: Cross-cultural adaptation

9.1 Study 9: Translation of the scale and testing its psychometric properties in managers

Study 9 aimed to examine the factorial structure, psychometric properties, and convergent validity of the English version of the CLTS in an English-speaking sample of managers. 260 managers (38.5% female, age range from 16 to 69 years, Mage = 39.93, SD = 11.39) leading employees mainly in the manufacturing, healthcare, and technology sectors in the United States (64.6%) and India (23.8%) participated in this study. A requirement for participation was that participants had to be native English speakers or have a native level of English.

The nine German items of the CLTS were translated into English, and the validity of the translation was assessed using the back-translation procedure (Brislin 1970; see Table 1). To prove the convergent validity of the English version of the scale was again related to the selection of items from the MLQ 5X Short (Avolio and Bass 2004) measuring leaders’ charisma (Towler 2003; α = 0.90).

I replicated the data analysis procedures stated in Study 1. Exploratory factor analysis, with a satisfactory KMO of 0.798 and a significant Bartlett test at χ2(36) = 385.81, p < 0.001, indicated two factors according to the Kaiser-Guttmann criterion at Eigenvalues of 3.02 and 1.07. In contrast, the Scree Plot and Parallel Analysis again indicated a single factor. The two-factor solution would result in a first factor comprising six items that explain 25.64% of the total variance and a second factor comprising only two items and explaining 6.11%, which would correlate at r = 0.57. The single-factor solution would explain 25.53% of the variance with all items sufficiently loading on the factor at 0.61 for storytelling, 0.60 for gestures, 0.57 for facial expressions, 0.52 for smiling, 0.50 for rhetorical questions, 0.47 for contrasts, 0.43 for metaphorical language, 0.42 for focused gaze, 0.41 for vision.

I again conducted further confirmatory factor analyses to gain further insight into the factorial structure of the English version of the CLTS. The single factor solution provided an overall good fit to the data (χ2(21) = 23.47, p = 0.320, χ2/df = 1.118; CFI = 0.993; TLI = 0.989; RMSEA = 0.021; SRMR = 0.036; BIC = 156.92; CAIC = 180.92). A model differentiating between the verbal and nonverbal items of the questionnaire showed a notably worse fit (χ2(22) = 46.14, p = 0.002, χ2/df = 2.097; ∆χ2(1) = 22.40, p < 0.001; CFI = 0.936; TLI = 0.896; RMSEA = 0.065; SRMR = 0.049; BIC = 174.04; CAIC = 197.04). The two-factor solution proposed by the EFA, however, provided a comparable fit to the single-factor model (χ2(15) = 21.30, p = 0.127, χ2/df = 1.420; ∆χ2(6) = 2.44, p = 0.875; CFI = 0.982; TLI = 0.966; RMSEA = 0.040; SRMR = 0.036; BIC = 138.08; CAIC = 159.08). Still, based on the previous factor analyses and the lack of interpretability of the second factor, I retained the single factor solution, which provides an omega of 0.75 and a composite reliability of 0.75. As a further analysis, I replicated the analyses on the convergent validity with the well-established measure of leaders’ charisma. As before, the CLTS corresponded well with the outcome-centric scale (r = 0.64, p < 0.001; see Fig. 1).

These analyses replicated the findings of the initial studies on the CLTS’ one-factorial structure and convergent validity with the most established measure of leaders’ charisma, the MLQ, and therefore indicate the CLTS to be a suitable instrument to obtain leaders’ self-ratings of their charismatic charismatic leadership tactics across cultures and languages.

9.2 Study 10: Translation of the scale and testing its psychometric properties in employees

As in Study 2, I aimed at extending the findings on the factorial structure, psychometric properties, and convergent validity of the follower-variant in an English translation. 233 workers (35.6% female; Mage = 34.42, SD = 10.36, range 20–75), mainly from the US (64.4%) and India (27.9%) participated in this study. All were in active employment in an organization, predominantly in the technology (33.0%), manufacturing (12.4%), education, or healthcare sectors (11.6% each). Most participants had obtained a college (61.4%) or master’s degree (31.3%). A requirement for participation was that participants had to be native English speakers or have a native level of English. For this study, I employed the follower-version of the translated measure and again related it to leaders’ charisma, as measured by a selection from the MLQ 5X Short (Avolio and Bass 2004; α = 0.87) to assess the CLTS’s convergent validity.

I followed the same procedures outlined in Study 1, beginning with exploratory factor analysis, with a satisfactory KMO of 0.850 and a significant Bartlett test at χ2(36) = 450.72, p < 0.001, which again indicated two factors according to the Kaiser-Guttmann criterion at Eigenvalues of 3.43 and 1.18. In contrast, as before, the Scree Plot and Parallel Analysis suggested a single factor. The first factor of the two-factor solution would comprise four items, explaining 31.21% of the total variance; the second factor would encompass five items (the four nonverbal items and the vision item) and explained 6.28%, with both resulting factors correlating at r = 0.64. The single factor solution explains 30.61% of the variance and all items load sufficiently on the factor at 0.66 for gestures, 0.65 for metaphorical language, 0.60 for storytelling, 0.54 for facial expressions, rhetorical questions, and focused gaze, respectively, 0.51 for contrasts, 0.47 for smiling, and 0.43 for vision.

As in the previous studies exploring the factorial structure of the CLTS, I again conducted further confirmatory factor analyses. The single-factor solution provided an overall good fit to the data (χ2(22) = 20.73, p = 0.538, χ2/df = 0.942; CFI = 1.000; TLI = 1.005; RMSEA < 0.001; SRMR = 0.034; BIC = 146.10; CAIC = 169.10). A model differentiating between the verbal and nonverbal items of the questionnaire showed a worse (χ2(24) = 30.80, p = 0.160, χ2/df = 1.283; ∆χ2(2) = 10.07, p < 0.001; CFI = 0.984; TLI = 0.976; RMSEA = 0.035; SRMR = 0.042; BIC = 145.27; CAIC = 166.27) and the two factor solution proposed by the EFA provided the worst fit compared to the single factor model (χ2(26) = 38.72, p = 0.052, χ2/df = 1.489; ∆χ2(4) = 38.72, p < 0.001; CFI = 0.970; TLI = 0.958; RMSEA = 0.046; SRMR = 0.045; BIC = 142.29; CAIC = 161.29). Therefore, I again retained the single-factor solution, which provides an omega of 0.80 and a composite reliability of 0.80. Again, calculating the convergent validity, I found the follower ratings on the CLTS to correspond well with leaders’ charisma as rated on the MLQ (r = 0.54, p < 0.001; see Fig. 1).

As in the previous study, the factorial structure and convergent validity of the observer-measure for the English translation of the CLTS were replicated, indicating that both the leader- and follower-rated questionnaires are equally suitable cross-cultural measures for charismatic leadership tactics.

10 General discussion

Drawing on the signaling approach (Antonakis et al. 2016) this work introduces the first scale to behaviorally measure charismatic leadership tactics via self- and observer-report, circumventing the conceptual pitfalls of existing questionnaire measures. The scale assesses the use of nine charismatic leadership tactics (Soto and John 2019), proven to have a signaling effect in leader–follower interactions. Across ten studies, the scale demonstrates a replicable one-factor structure (Studies 1, 2, 3, 9, 10) and good internal consistency (all studies). It shows moderate to high self-other agreement (Study 3) and exhibits the hypothesized convergent, divergent, and incremental validity compared to established measures (Studies 3, 4, 5, 6, 8). The scale displayes adequate criterion-related validity (Studies 5, 6, 7), is embedded between conceptually relevant antecedents (Study 6) and expected outcomes (Studies 3, 7), and is sensitive to changes in signal use (Study 8). Notably, the scale can be translated into another language without compromising its psychometric properties or factor structure (Studies 9, 10). Thus, the CLTS consistently meets the gold standard for measuring leadership (see Table 2; Crawford and Kelder 2019; Wright et al. 2017).

The new questionnaire’s most critical evaluation lies in its external validity. Specifically, does the CLTS accurately capture managers’ actual use of charismatic signals? Three key findings from the current study support the assertion that the CLTS is an externally valid instrument for measuring charismatic leadership tactics. First, managers and employees show significant agreement in their perceptions of managers' charismatic signal use (24%, uncorrected r = 0.49, p < 0.001; Study 3), aligning with previous findings and meta-analytic evidence on other leader behaviors (Lee and Carpenter 2018; Amundsen and Martinsen 2014). Second, observers’ CLTS ratings of charismatic tactics correlate with objective measurements of those tactics used by leaders in videos, demonstrating the CLTS captures actual behaviors (Study 5; e.g., Maran et al. 2019). Third, the CLTS is sensitive to changes in charismatic tactic use, both when reported by managers and assessed by observers (Study 8). Together, the findings attest to the CLTS’ external validity as an effective measure of the charismatic leadership tactics managers employ.

The instrument's development yields rich empirical support for the signaling approach to leaders' charisma (Antonakis et al. 2016). When employed as an exogenous variable in cross-sectional designs, assessing managers’ use of charismatic leadership tactics demonstrates incremental validity over measures capturing managers’ charisma as an effect (study 3). Moreover, the findings bolster the signaling approach’s assertion that signals of leaders' charisma convey a valuable leadership ability. Individuals with higher divergent thinking ability, a cognitive ability that supports creativity (e.g., Silvia et al. 2013) and relates strongly to fluid intelligence (e.g., Nusbaum and Silvia 2011), employ more charismatic signals and consequently appear more charismatic to their audience (e.g., von Hippel, et al. 2016; study 6). Furthermore, recipients act upon the charismatic signals captured by the CLTS. For example, these signals render recipients more easily influenced in negotiations, skewing outcomes in the sender’s favor (study 7). Finally, the results reinforce existing evidence that managers and entrepreneurs can be trained in charismatic leadership tactics (Antonakis et al. 2011; Frese et al. 2003; Towler 2003). The CLTS sensitively detects these training effects in both managers’ and entrepreneurs’ self-reports and their peers’ observer reports (Study 8).

The introduction of the scale enables efficient measurement of charismatic leadership via self- and peer-report in contexts where the observation and coding of managers’ direct behavior are either impossible or too time-consuming. This includes samples lacking charismatic artifacts, such as videos, audio recordings, text transcripts, or other data (Chandler et al. 2023), and extends to middle managers or executives in privately held firms. The scale also facilitates exploration of unanswered questions requiring large samples or initial cross-sectional exploration. For example, researchers can more efficiently examine how the emergence and impact of charismatic leadership depends on situational factors (Shamir and Howell 1999; Oc 2018), including a firm’s environment, strategy, life cycle stage, culture, structure, task types, and follower characteristics (e.g., Davaei and Gunkel 2024; Jansen et al. 2009; Stoiber et al. 2023; Zaech and Baldegger 2017).

Despite the evidence supporting the scale's validity, important limitations should be considered when interpreting these findings and applying the scale. First, the scale’s accuracy depends on respondents’ ability to recall the frequency of their own or their leaders’ use of charismatic tactics (Antonakis et al. 2016). While the current study demonstrates the scale’s sensitivity to changes in self and observer reports over time, further research is needed to determine whether time-lagged reports reflect the actual variance managers’ use of these tactics. Second, the selection of nine signals was primarily driven by methodological considerations (Soto and John 2019). Future research should explore whether integrating additional signals enhances the scale's predictive power. For instance, moral convictions or values (Antonakis et al. 2016; Lin et al. 2022), using "we-talk" to frame the group as a reference frame (Fladerer et al. 2021), employing unconventional clothing styles (Maran et al. 2021, 2022), or the prosodic features of a manager's voice (Niebuhr et al. 2017) are linked to perceptions of charisma and leader effectiveness. Third, the “leadership vitamin” metaphor suggests that managers' charisma interacts with other leader behaviors instead of operating in isolation. For example, vision presentation is particularly effective when combined with task-related behaviors like goal setting or operational instructions (Gochmann et al. 2022; Liegl and Furtner 2024), potentially helping employees connect their work to the broader vision and thus increase effort (Maran et al. 2022). Further research is needed to clarify this interplay and the precise role of charismatic tactics in the leadership process.

To conclude, this work introduces a new scale, the Charismatic Leadership Tactics Scale (CLTS), which measures managers’ use of charismatic leadership tactics while avoiding limitations of prior conceptualizations. The CLTS operationalizes the signaling approach to leaders' charisma (Antonakis et al. 2016), enabling researchers to study managers' charismatic signaling as an exogenous independent variable unconfounded by outcomes. In a multi-stage litmus test across ten studies, the scale demonstrates strong psychometric properties and criterion-related validity. It provides an efficient means to re-examine prior on charismatic leadership that relied on endogenous measures, which are influenced by their effects. Moreover, the CLTS enables novel research on the impact of leaders' charisma in samples precluding behavioral observation or experimental manipulation. By offering a valid, exogenous measure of charismatic leadership, the CLTS scale advances the field's ability to robustly test theory on this important phenomenon.