1 Introduction

Financial information conveyed through analysts’ earnings forecasts has been long considered an important element in the valuation of a firm, as it can impact strongly the firm’s stock price returns.Footnote 1 Research findings documenting this understanding have been reported by Abarbanell and Bernard (1992), Bao et al. (1997), Beneish and Harvey (1998), Lim (2001), Diether et al. (2002), Copeland et al. (2004), Loh and Mian (2006), and numerous other authors whose work has been reviewed in a survey article by Ramnath et al. (2008, Sect. 3.3).

It is hence important to understand the inner working of analysts’ information processing mechanism, its efficiency, and underlying contributing factors. To this end, researchers have long studied the relative accuracy of analysts’ earnings forecasts in hopes of acquiring greater insight into these subjects. Such research work has been reported by Richards (1976), Brown and Rozeff (1980), O’Brien (1987, 1990), Butler and Lang (1991), Stickel (1992), Sinha et al. (1997), and other authors whose empirical findings have been summarized in Ramnath et al. (2008, Sect. 3.2).

Earlier research papers on analysts’ relative earnings forecast accuracy, approximately prior to year 1991, reported results that showed little evidence of persisting differential performance among analysts. The empirical results were viewed as evidence in support of the hypothesis that superior (inferior) analysts do not exist.Footnote 2 Contrary to this view, Stickel (1992) and Sinha et al. (1997) provide empirical findings that suggest the existence of superior analysts. These two studies have motivated and provided a guidepost for our research presented in this paper. To help the reader capture the main idea, the questions raised in their work, and reasons behind our present work, we highlight in the next two paragraphs the research findings reported therein.

Stickel (1992) analyzed earnings forecast accuracy of the members of the All-American Research Team for the years 1981–1985. He showed evidence suggesting that team members exhibited superior accuracy in earnings forecasts during their tenure on the team. Two observations are of special interest. First, the membership of the team was elected by users of analysts’ services, predominantly institutional investors, via a voting scheme sponsored by the publisher of Institutional Investor. Second, Sinha et al. (1997, p. 38, endnote #2, based on the data of 1985–1990) later pointed out that the length of tenure of membership on the team was on average less than 2 years. The first observation raises the question: In what other ways can we sort out the “superior” analysts from the “inferior” analysts without a time consuming and obviously subjective voting scheme? The second observation leads us to another question: How long can a superior (inferior) analyst stay superior (inferior) over multiple years?

The second paper by Sinha et al. (1997) presents an ex ante analysis that investigates whether an analyst’s past superior (inferior) performance in a formation sample period can carry forward to a holdout sample period (of 1 year) on a firm-specific, and industry-specific, basis. They show that based on the data for the period 1984–1993, there was a significant carryover effect for superior performance, but not for inferior performance. The question left unanswered was: How prevalent and strong this phenomenon is over different industries and time periods?

Besides these open questions, we noted that to date, research on this subject has been carried out on an ad hoc basis. Relative accuracy of analysts’ earnings forecasts was analyzed without an organized statistical model structure. A case in point is Sinha et al. (1997). On the other hand, characteristics of analysts with superior/inferior forecast accuracy normalized by average absolute forecast errors in a firm were analyzed, but without first having their relative ranking identified, or persistence checked (Clement 1999; Ghosh and Whitecotton 1997; Jacob et al. 1999; Mikhail et al. 1997, 2003). In light of these observations, in this paper we conduct a comprehensive empirical analysis employing newer database and more efficient statistical modeling concepts and techniquesFootnote 3 derived from the Markov chain model.Footnote 4 This is with the aim to systematically explore the nature, stochastic properties, and determinants of analysts’ relative forecast accuracy.

More specifically, we conduct our analysis to address three linked questions: first, does an analyst’s superior (inferior) forecasting accuracy statistically persist over time; second, if so, how long will the superior (inferior) performance last; and third, what are the determinants of analysts’ relative forecasting accuracy?

A brief preview of our analytic approach and empirical findings follows.

We adopted a measure for analysts’ forecast accuracy called Absolute Proportional Forecast Error (AFE). We then compiled historical transition probability matrices of the quintile rank of analysts’ average annual AFE from a formation year to five subsequent observation years. This was done for the overall dataset and each of the thirteen major industries separately. Our results suggest that analysts are highly heterogeneous in their forecast accuracies, and their relative performance persists much more strongly than as suggested by a pure chance model. This persistence is then classified into a “first order” and, when appropriate, a “second order” Markov chain model. We call such persistence “first degree Markov persistence” and “second degree Markov persistence”,Footnote 5 , Footnote 6 respectively.

We then identified a two-component Markov chain model that partitions the hidden influencing factors into two separate components. One is named “transient” and the other “long-lasting”. We then estimated the proportional share and the strength of each component. We have found that, for the overall dataset, the transient and the long-lasting component each contributes to analysts’ time pattern in relative forecast accuracy in a ratio of about 58–42, while the staying power, in terms of the half-life span, of the latter is about ten times that of the former. However, the share and strength of either component varies considerably among different industries. We also observe that, while analysts’ relative forecast accuracy exhibits the first degree Markov persistence in each of the thirteen industries, it shows evidence of the second degree Markov persistence in just five industries.

To search for long-lasting influencing factors, we linked analysts’ relative performance to their professional characteristics and salient features of the firms that they follow. The results show that, based on data from the years 2004 through 2006 and for the overall dataset, with other things being held constant, an analyst with longer experience performs better, while an analyst with heavier workload performs worse. In a similar manner, an analyst’s forecast accuracy is better for firms with larger capitalization sizes. Further, the size of the sponsoring brokerage house and the average capitalization-size growth rate of the followed firms do not significantly affect an analyst’s performance in the sample data that we study. But these findings for the overall dataset do not apply uniformly to individual industries. This observation appears to suggest that analysts’ performance in different industries hinges importantly on different types of information gathering skills, analytical capacities, and other factors peculiar to individual industries. A discriminant function analysis on the predictive power of the set of explanatory variables confirms the validity of our findings.

As a check on the robustness of our findings, we used an alternative relative accuracy measure designed to control firm-specific factors. The empirical results confirm the existence of the “first degree Markov persistence” for most industries.

We hasten to point out that the analysis reported in this study can help improve investors’ utilization of analysts’ earnings forecasts. For instance, this analysis can be used to sort out “superior” analysts from “inferior” analysts. A composite earnings forecast can hence be constructed by placing more weights on forecasts made by superior analysts and less on forecasts by inferior analysts. Such a forecast will likely perform better than the now-prevalent simple consensus forecast, which places equal weight on forecasts made by all analysts following a firm. It is also hoped that a modified composite earnings forecast can provide early signs of shifts (innovations) in earnings trend, and help characterize intrinsic investment risks in a real-time investment analysis.

The organization of the remainder of this paper is as follows: Section 2 describes the data structure and constraints. Section 3 defines the basic variables and explains the methodology. Section 4 provides descriptive statistics of the analyst data and reports preliminary findings on Markov persistence in various sub-samples. Section 5 reports formal test results for the first and second degree Markov persistence for thirteen industries. Section 6 presents estimation results for a two-component Markov chain model. Section 7 reports results that link analysts’ performance quintile ranking to characteristics of the analysts and the followed firms. Section 8 provides additional results based on an alternative measure for accuracy ranking. Section 9 summarizes the findings of the study and provides concluding remarks.

2 Data structure and constraints

The data for our study come from two sources and consists of all firms that appear in both (a) the Institutional Brokers Estimate System (I/B/E/S) files, and (b) the Center for Research in Security Prices (CRSP) files. We obtain earnings data from I/B/E/S and price and capitalization data from CRSP. Forecasts of quarterly earnings reported by analysts from over 300 brokerage firms are extracted from I/B/E/S Detail History files. Each record of forecast made by an analyst for a firm and a fiscal quarter contains the company ticker, brokerage firm identifier, analyst identifier, ending date of the forecasted quarter, earnings estimate, and the date on which the forecast was made. I/B/E/S retains, in its best efforts, the analyst identifier code as an analyst moves from a brokerage firm to another. The sample covers the period from the first quarter of 1984 through the fourth quarter of 2006, consisting of 92 consecutive quarters. However, in light of insufficient number of analysts at the industry level in earlier years, we conduct our main analysis using data starting from 1988.

To alleviate undesirable influences from irregular data points and in consideration of compatibility of quality and timing of earnings forecasts made by different analysts, we impose several requirements on the data. They are stated and explained below.

First, we focus on quarterly predictions made up to 90 days but no less than 30 days prior to the end of the fiscal quarter being forecasted.Footnote 7 If an analyst makes more than one forecast for that quarter within that time window, only the first forecast is considered. This is to avoid the data complexity arising from continual revisions of earnings estimates by individual analysts. This is also to alleviate potential distortions from herding among analysts during the latest part of the quarter. We feel that such restriction of the timing window for earnings forecasts is appropriate and necessary.Footnote 8 , Footnote 9 When a firm’s fiscal quarter is different from a regular calendar quarter, but ends in a particular calendar quarter, the earnings estimates of that fiscal quarter are identified with that calendar quarter.

Second, we require that each firm in the sample must have earnings forecasts from at least four different analysts in the quarter immediately prior to the quarter being studied. This consideration follows the examples set by Hilary and Menzly (2006) and Abarbanell and Lehavy (2003). The purpose is to reduce the number of occurrences of unduly large proportional forecasting errors often associated with little known firms followed by only one or two analysts. These firms usually lack reliable financial information.

Third, we require that an analyst make forecasts for at least ten different quarters for a given firm over our sample period in order to be included in our sample for that firm. This is to assure that an analyst’s performance persistence is examined on firms being followed by the analyst on a reasonably steady basis.

Fourth, we cap (winsorize) at 1.0 the absolute error of a quarterly earnings forecast after it is deflated by the absolute value of the corresponding actual earnings number (i.e., quarterly AFE as defined below). This is to reduce the undue impacts of an actual earnings number that is near zero.

Finally, fifth, in the event that forecasts from an analyst for a firm are missing for some quarters in a year, the average of the remaining available quarterly AFEs is used as the analyst’s yearly AFE.

In analysis reported in later sections, we divide our sample firms into thirteen industry groups based on the first two digits of their SIC numbers, as suggested in Breeden et al. (1989, pp. 243–245, Table 2). These thirteen industries are: Petro (petroleum), Consum (consumer durables), Basic (basic industries), Food (food and tobacco), Constr (construction), Financ (finance and real estates), Capita (capital goods), Transpo (transportation), Utilit (utilities), Textile (textiles and trade), Servic (services), Leisur (leisure), and Other (others which are not included in the preceding twelve groups).

3 Methodology

Suppose that an analyst, j, makes, at time t − 1, a forecast of earnings per share (EPS) of firm k for a quarter ending at time t. We shall denote that forecast as \( {}_{t - 1}EPS_{j,t}^{k} \) and the actual EPS of firm k for a quarter ending at time t as \( EPS_{t}^{k} \). We then measure an analyst’s forecast accuracy for the quarter ending at time t by the absolute value of the ratio of the forecast error to the actual earnings (abbreviated as AFE t ) [see Sinha et al. (1997), and Cooper et al. (2001)].Footnote 10 It is formally defined below.Footnote 11

$$ _{t - 1} AFE_{j,t}^{k} = \left| {{\frac{{{}_{t - 1}EPS_{j,t}^{k} - EPS_{t}^{k} }}{{EPS_{t}^{k} }}}} \right|, $$

The formula is an adaptation of the standard statistical measure for forecasting precision, termed “absolute percentage error” (APE), which is commonly employed in statistical modeling and forecasting literature (Hanke and Wichern 2009).

For a firm and a specific quarter, an analyst’s quarterly AFE is computed as defined above. The analyst’s annual AFE for a firm in a year is the average of the four quarterly AFEs of the year. The analyst’s “average annual AFE” in a year is then the average of annual AFEs of all of the firms that the analyst follows in that year. Such a composite measure is needed to assess an analyst’s general intrinsic ability in information acquisition and analysis. It is distinct from unique firm-specific talents that may have limited value to investors in general because of susceptibility to herding or imitations among analysts who follow the same firm. Further, a firm-by-firm assessment of analysts’ relative forecast accuracy is difficult to carry out reliably because of analysts’ frequent turnovers. A relative accuracy measure that is specifically designed to neutralize firm-specific effects is presented and discussed in a later section.

In a particular year, all analysts selected under our data selection criteria as specified in Sect. 2 are ranked and separated into five equal-sized groups, that is, by their quintiles, based on their average annual AFEs. The quintile group with smallest average annual AFEs is called “quintile 1”, or the “top quintile”, while the quintile group with the largest average annual AFEs is called “quintile 5”, or the “bottom quintile”. The other three quintile groups are defined similarly. After analysts are identified by these five initial quintile ranks in a formation year, we track their quintile ranks in the subsequent five observation years. We started the first tracking cycle by placing the formation year in 1988 and moved the formation year and the trailing “five-year observation window” forward one year at a time till it reached year 2001. Fourteen tracking cycles resulted.

Development of the Markov chain model for our analysis is explained below. In a pure chance model, where accuracy among analysts has no intrinsic difference, an analyst’s quintile rank in the formation year should come from a random draw from integers 1 through 5. This same phenomenon should then repeat itself in the subsequent observation years. This means that the quintile rank of an analyst in subsequent years should have equal probability to fall within each of the five possible quintile ranks. As such, an analyst’s average rank in each of the subsequent observation years, regardless of the initial quintile group membership, should be equal to 3.0 (the mathematical center point). Evidence of a departure from this pure chance model in favor of positive persistence suggests that there are intrinsic differences in analysts’ forecast accuracy and that truly superior/inferior performers exist.

To prepare our discussions in Sect. 5, we describe below the Markov transition probability matrix and its use in obtaining the frequency distribution under the one-step Markov chain model.

Define the transition probability matrix consisting of 6 × 6 elements, i.e., P i,j , for a transition from quintile i (ith row) in one year to quintile j (jth column) in the next year as follows.

$$ {\bf P}^{(\bf i)} = \left( {\begin{array}{*{20}c} {P_{{11}} } & {P_{{12}} } & {P_{{13}} } & {P_{{14}} } & {P_{{15}} } & {P_{{16}} } \\ {P_{{21}} } & {P_{{22}} } & {P_{{23}} } & {P_{{24}} } & {P_{{25}} } & {P_{{26}} } \\ {P_{{31}} } & {P_{{32}} } & {P_{{33}} } & {P_{{34}} } & {P_{{35}} } & {P_{{36}} } \\ {P_{{41}} } & {P_{{42}} } & {P_{{43}} } & {P_{{44}} } & {P_{{45}} } & {P_{{46}} } \\ {P_{{51}} } & {P_{{52}} } & {P_{{53}} } & {P_{{54}} } & {P_{{55}} } & {P_{{56}} } \\ {P_{{61}}^{{(i)}} } & {P_{{62}}^{{(i)}} } & {P_{{63}}^{{(i)}} } & {P_{{64}}^{{(i)}} } & {P_{{65}}^{{(i)}} } & {P_{{66}}^{{(i)}} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {P_{{1.}}^{'} } \\ {P_{{2.}}^{'} } \\ {P_{{3.}}^{'} } \\ {P_{{4.}}^{'} } \\ {P_{{5.}}^{'} } \\ {P_{{6.}}^{{(i)'}} } \\ \end{array} } \right) $$
(1)

where \( P_{6j}^{(i)} = \left( {0.05} \right) \) \( \left[ {\left( {{{P_{ij} } \mathord{\left/ {\vphantom {{P_{ij} } {\sum\nolimits_{\ell= 1}^{5} {P_{i\ell } } }}} \right. \kern-\nulldelimiterspace} {\sum\nolimits_{l = 1}^{5} {P_{i\ell } } }}} \right)} \right] \), and \( P_{66}^{(i)} = 0.95 \), i, j = 1, 2, 3, 4, 5, \( \sum\nolimits_{j = 1}^{6} {P_{ij} } = 1 \), for i = 1, 2, 3, 4, 5, and the index “i” represents the initial quintile rank. Entries on the last (6th) row and the last (6th) column of the matrix represent the probabilities of missing observations (due to temporary or permanent exit of analysts from the sample). Due to the programming complexity in calculating precise frequencies for the entries on the 6th row, we estimate \( P_{66}^{(i)} \) to be 0.95 (probability to stay missing in the next year) and maintain a proportionally identical distribution on the remaining elements of the 6th row (as seen on the transition probability array), going from the corresponding initial quintile group to all quintile groups, summed to 0.05. We also note that in a steady one-step Markov chain process the values of the elements in the first five rows of the transition probability matrix are fixed when an analyst’s quintile number moves from the time origin to the next year, and then to the year after, and so on. These elements are unaffected by the membership of an analyst’s initial or subsequent quintile rank. Further, all of the values of the 6th row are set equal to zero for the transition from the formation year to the first observation year.

Let the expected probability arrays for t subsequent years computed based on the one-step Markov chain model which imposes a constant one-step transition probability matrix beyond the first year, be expressed as

$$ M^{(i)} = \left( {M_{1}^{(i)} ,M_{2}^{(i)} ,M_{3}^{(i)} , \ldots ,M_{t}^{(i)} } \right), $$
(2)

where \( M_{1}^{(i)} = P_{i.} \), and \( M_{t}^{{(i)}} = {\mathbf {P}}^{\mathbf {(i),}} M_{{t - 1}}^{{(i)}}\), i = 1, 2, 3, 4, 5, and t = 2, 3, 4, 5. We define the term “first degree Markov persistence” as a state in which a violation of the equal probability hypothesis takes place in favor of positive persistence of initial quintile rank, where the null hypothesis is

$$ \begin{array}{*{20}c} {H_{0}^{(1st)} :P_{ij} = P_{ik} ,\quad \hbox{for} \, j \, \ne \, k,\quad {\text{and}}\quad i, \, j, \, k \, = \, 1, \, 2, \, 3, \, 4, \, 5,\quad {\text{and}}} \\ {P_{i6} = P_{j6} ,\quad \hbox{for} \, i \, \ne \, j,\quad {\text{and}}\quad \, i, \, j, = 1, \, 2, \, 3, \, 4, \, 5.} \\ \end{array} $$

In other words, under the null hypothesis \( H_{0}^{{(1{\text{st}})}} ,P_{ij} = (1 \, - {\bar P}_{6} )/5 \), for i, j = 1, 2, 3, 4, 5, and \( P_{i6} = {\bar P}_{6}\), for i = 1, 2, 3, 4, 5, where \( {\bar P}_{6} \) is the estimated average probability of missing observations in the first observation year across all five initial quintile groups.

Further, to test the conformity of the observed frequency distributions of quintile ranks in five subsequent years to the one-step Markov chain model, we define the state of “second degree Markov persistence” as a rejection, in favor of a higher degree of positive persistence, of the null hypothesis listed below.

$$ H_{0}^{{(2{\text{nd}})}} :M_{t}^{(i)} = M_{t}^{(i)*} , \, i = \, 1, \, 2, \, 3, \, 4, \, 5\quad {\text{and}}\quad t \, = \, 2, \, 3, \, 4, \, 5, $$

where \( M_{t}^{(i)} \) is the array of observed relative frequencies, while \( M_{t}^{(i)*} \) is the array derived from the one-step Markov chain model, based on the one-step transition probability matrix P (i) as described in Eq. 2. In summary, a state of second degree Markov persistence can occur only after the first degree Markov persistence is established, for the former is at a higher level of persistence.

In testing the first hypothesis, we use a χ2 test statistic on a two-way contingency table. In the second test, we use a one-way χ2 test for the equality of an observed discrete frequency distribution to its theoretical counterpart. The latter procedure applies to the individual frequency distribution of each initial quintile group and each of the five observation years.

Further, we define rank(k, t) as the average quintile rank value at time lag t (the t-th observation year) for the group of analysts initially ranked in the k-th quintile, that is,

$$ x\left( {k,t} \right) = \left| {{\text{rank}}\left( {k,t} \right) \, - \, 3.0} \right|,\quad t \, = \, 0, \, 1, \, 2, \, 3, \, 4, \, 5, $$
(3)

where t = 0 refers to the formation year, and x(k, 0) = ∣rank(k, 0) − 3∣ = ∣k − 3∣. Further, let

$$ y\left( {k,t} \right) \, = \, x\left( {k,t} \right) \, / \, x\left( {k,t - 1} \right),\quad t \, = \, 1, \, 2, \, 3, \, 4, \, 5. $$

We define

$$ \beta_{1} = y\left( {k,1} \right), $$
(4)

and

$$ \beta_{2} = {\text{geometric}}\,{\text{average}}\,\left[ {y\left( {k,2} \right), \, y\left( {k,3} \right), \, y\left( {k,4} \right), \, y\left( {k,5} \right)} \right]. $$
(5)

The term β1 is the convergence rate of the average quintile rank to the median rank (i.e., 3.0) from t = 0 to t = 1, while β2 is the geometric average of the convergence rates of the average quintile rank from t = 1 to t = 5, both for the analysts initially in the k-th quintile group. Under \( H_{0}^{{(2{\text{nd}})}} \), the value of β1 is approximately equal to β2. On the other hand, a violation of \( H_{0}^{{(2{\text{nd}})}} \)in favor of a stronger positive persistence beyond the first observation year (i.e., exceeding the first degree Markov persistence) implies β2 > β1. Testing the equality of the two parameters can shed light on whether the second degree Markov persistence exists.

Another way to look at the second-degree Markov persistence from the perspective of a multiple-component Markov process is through a process in which multiple components (categories) of hidden factors, embedded among analysts or within each analyst, are mixed in a random fashion to steer the observed transition probability matrix. For simplicity, we use a two-component model for analysis. A straightforward way to estimate the structure of the model is to examine the pattern of convergence of the average quintile rank toward three over the five observation years for the extreme (the top and the bottom) initial quintile groups. The two-component Markov model implies the following pattern of the convergence of the average quintile rank toward three over the five subsequent observation years.

$$ f(p,a,b) = x(k,t)/x(k,0) = pe^{ - at} + (1 - p)e^{ - bt} ,\quad k \, = \, 1, \, 5;\quad t \, = \, 0, \, 1, \, \ldots , \, 5, $$
(6)

where a > b and the quantity x(k,t) has been defined in Eq. 3. In the above equation the quantity p is the probability assigned to the first component, which has an exponential convergence parameter “−a” associated with some “transient” factors. The remainder of the probability is assigned to the second component which has a convergence parameter “−b” associated with some “long-lasting” factors. The values e a and e b represent the convergence rate toward the median rank, three, for the “transient” and the “long-lasting” factors respectively.Footnote 12

Details of the computational procedures for the defined terms and relevant test statistics under the two hypotheses and for the two-component Markov chain model outlined above are provided in later sections.

4 Descriptive statistics and distributions of quintile ranks

We first tabulated the frequency distribution, before the data reductions as explained in Sect. 2, of the number of analysts who have ever covered a given number of firms/industries for the period 1984–2006. A total of 415,654 data points (quarter-analyst-firm triplets) are involved. We observed that 40% of analysts cover only one industry, while 17% of analysts cover more than three industries, with the average number of industries covered by an analyst equal to 2.25. Furthermore, we note that 13% of analysts cover only one firm, while about 30% cover up to three firms. On the other hand, about 28% of analysts cover more than ten firms. The average number of firms ever covered, at one time or another, by an analyst is 8.8. The number of distinct analysts involved in our qualified sample is 2,655.

We then constructed summary statistics of analysts’ average annual AFEs and the number of analyst-firm-year triplets in our data sample, cross-tabulated by industry and by year. Some noticeable characteristics of the data are summarized below. First, the value of the average AFE, aggregated over the 23-year sample period, varies substantially in different industries, striding a range between 0.15 (for finance) and 0.32 (for petroleum). Two industries (petroleum and transportation) post average AFE values above 0.25, while the remaining eleven industries below 0.25. Second, the AFE value, averaged over thirteen industries, for each of the 23 years, is close to 0.22, except for the periods 1994–1997, year 2000, 2005 and 2006, during which the average AFE value is at around 0.19. It exhibits the time-varying nature of the AFE. Third, basic, finance and consumer durable have the highest overall level of analyst participation, as is reflected by the number of analyst-firm-year triplets recorded in the sample, while other, construction, and transportation have the lowest level of analyst participation.Footnote 13 The three characteristics outlined above suggest the need to separate analysts by industry. Further, a quintile ranking system that focuses on the relative performance among analysts in a year is a natural way to normalize the impacts of time varying level of the average AFE over different years.Footnote 14

Table 1 displays, for the overall dataset as an illustration for our analytic procedure, the frequency distributions of quintile ranks in five successive observation years averaged over the fourteen tracking cycles from 1988 through 2001.Footnote 15 The entries in this table show that the probability of staying in the initial quintile rank is about 40% for the two extreme (the first and the fifth) initial quintile groups in the first observation year. This probability declines to about 17% (32% when excluding missing observations) in the fifth year. The probabilities of missing observations climb steadily from about 15 to about 45% in 5 years for the two extreme quintile groups. On the other hand, the probabilities of missing observations move up from about 8% to about 41% in 5 years for the middle three initial quintile groups.Footnote 16

Table 1 Frequency distributions of analysts’ accuracy quintile ranks in five subsequent years after the initial rank in a formation year (overall AFE data—up to 2006—forecasting day within −90 to −30 of the end of a fiscal quarter)

As previously noted, the percentage frequency distributions exhibited in Table 1 are the average of fourteen sets of such distributions with the formation year starting in 1988 and ending in 2001. In each replication of the tabulation, the average rank value was computed and the fourteen average rank values were used to conduct a test against the hypothesis of a true mean equal to 3.0 under the pure chance model. The t-values so computed are listed in Table 1 on the last row of each panel. All t-test results, except for the third quintile group, are statistically significant at less than 1%, two-sided, level. This indicates a strong persistence in analysts’ performance for the overall dataset.

As evidenced by our discussion earlier, the average accuracy of earnings forecasts in terms of AFEs varies depending on the firm’s industry. This then necessitates a separate analysis for each industry. To begin, we compiled the number of analysts involved in a quintile group for each of the thirteen industries, and in each formation year, from 1988 through 2001.Footnote 17 We observed that a quintile size (i.e., the number of analysts in a quintile) ranges from 6 to 110. The average number of analysts in a quintile group, averaged over fourteen formation years, ranges among the industries from 12.1 (for transportation industry) to 77.2 (for capital goods industry). Moreover, the number of analysts in a quintile group in the overall dataset rises from 124 in 1988 to 375 in 2001, a 202% increase.

The frequency table in Table 1 for the distributions of quintile rank in five observation years was produced for each of the thirteen industries. For economy of space, we present in Table 2 only the key results from the test of the pure chance hypothesis for the initial top and bottom quintile groups. The main entries on the table are the values of the average quintile rank, one for each industry-observation-year combination, with its t statistic (against the null hypothesis that the average rank is equal to 3.0) listed in the parentheses immediately beneath. Results exhibited in Table 2 suggest that quintile rank persistence among analysts is strong in nearly all industries, with the strongest cases in basic industries, finance, capital goods, textile, and services; transportation is the only marginal case.Footnote 18

Table 2 Average quintile ranks based on AFEs in each of the 5 years subsequent to the formation year for the two extreme initial quintile groups for each of the thirteen industries

5 Markov persistence for thirteen industries

In search for further insights, we examined in more depth the stochastic properties of analysts’ performance persistence. The two sets of tests against \( H_{0}^{{(1{\text{st}})}} \)and \( H_{0}^{{(2{\text{nd}})}} \) outlined in Sect. 3 were used to conduct the analysis. Table 3 presents the results of such analysis for the overall dataset. Panel A of Table 3 provides the estimated one-year Markov transition probability matrix using the elements contained in Table 1. The numbers on or near the diagonal positions in the matrix reveal that the probability of an analyst to remain in or close to his/her initial quintile group is uniformly substantially above that predicted by the equal probability model, that is, \( \left( {1 - \, P_{i6} } \right)/5 \, = \, (1 - \bar{P}_{6} )/5 \)), or approximately 0.177.

Table 3 Tests of conformity of observed frequency distributions of quintile ranks to the pure chance transition model and the one-step Markov chain model (overall data—based on average AFEs)

We also observe, as previously noted, that the probabilities of missing observations for the middle three quintiles (between 8 and 11% as displayed on the last column) are measurably lower than those for the first and the fifth quintile (16 and 14%, respectively). This pattern of frequency distribution appears to link analysts’ career mobility (moving up or out from an analyst position) to their relative performance in forecast accuracy.

A two-way contingency table test is used to examine whether the computed transition probability matrix conforms to the equal probability hypothesis \( H_{0}^{{(1{\text{st}})}} \). The χ2 test statistic thus computed carries an exceedingly large value of 1,913.2, giving virtually a zero p-value.Footnote 19 This result shows a strong first-degree Markov persistence in analysts’ performance in earnings forecasts for the overall dataset.

Panels B and C of Table 3 provide the expected (under \( H_{0}^{(2nd)} \)) and the observed frequency distributions of quintile rank, one for each of the five observation years, for the two extreme initial quintile groups. (Details for the middle three initial quintile groups are omitted for economy of space.) The expected frequency distribution is obtained through the use of Eq. 2 explained in Sect. 3. Results from a series of χ2 tests are listed in Panel D of Table 3. For each initial quintile group and a particular observation year, we computed a χ2 statistic to test the conformity of the observed relative frequencies to the expected relative frequencies (one-way contingency).Footnote 20 The p-values of the χ2 test statistics are provided for the second through the fifth observation year for each of the five initial quintile groups. The test results are exceedingly significant for the two extreme initial quintile groups where they really count. This result suggests a strong second-degree Markov persistence in analysts’ accuracy in earnings forecasts for the overall dataset.

Because the nature and degree of difficulty in earnings forecasting may vary from industry to industry, we carried out the same analysis as presented in Table 3 for each of the individual thirteen industries. These results are displayed in Table 4. Two main findings emerge and are outlined below.

Table 4 Tests of conformity of observed frequency distributions of quintile ranks to the pure chance transition model and the one-step Markov chain model for thirteen industries (based on AFEs)

First, results of analysis for all thirteen industries exhibit strong first-degree Markov persistence. The pure chance hypothesis for analysts’ relative forecast accuracy is rejected at exceedingly low significance levels for each of the thirteen industries.

Second, the second-degree Markov persistence is confirmed for the following five industries: basic industries, finance, capital goods, textiles, and services, occurring mainly with the initial top quintile group.

6 The two-component Markov chain model and the behaviors of β1 and β2

Analysis using β1 and β2, as explained in Sect. 3, has several benefits beyond the analysis presented in Sect. 5. First, it condenses the various sets of test statistics contained in Tables 3 and 4 to two parameters. Second, it captures some important characteristics of the Markov persistence: the rate of convergence of the average quintile rank toward the median rank (i.e., 3) in successive observation years, from the initial quintile rank. Third, the values of the two parameters can be displayed in the form of time series to exhibit the strength of persistence over time. And fourth, a simple test on the equality between β1 and β2 (as explained in Sect. 3) can shed light on the existence of the second-degree Markov persistence.

We calculated the values of β1 and β2 for each combination of formation year (1988–2001, 14 formation years) and industry (13, plus the overall dataset), and for each of the two extreme quintile groups. Because of the large volume of numerical results, we summarize below important observations distilled from the results.

First, for the overall dataset, the values of both β1 and β2 for the two extreme initial quintile groups exhibit a stable and horizontal time pattern, with the average values equal to approximately 0.54 and 0.85, respectively. The observed values of β1 and β2 over the fourteen tracking cycles are within ±15% range of the average values. An analysis of variance test confirms the hypothesis of constant value for both β1 and β2 .

Second, a similar horizontal pattern, but at different average levels and with larger proportional variations over time, is observed for both β1 and β2 with individual industries. A special analysis for the stationarity of the two parameters for the two extreme initial quintile groups and for each industry group was carried out. The general finding is that there is little evidence to suggest a significant degree of non-stationarity, or a discernable non-horizontal time pattern, on the two parameters for any of the thirteen industries. The proportional variations over time around the average parameter values are largely within a ±25% range.

Third, tests of the equality of the values of the two parameters β1 and β2 for the overall dataset and for each of the thirteen industries were carried out. The tests were based on the estimates over the fourteen tracking cycles. The test results confirm the second-degree Markov persistence for the overall dataset and the following seven industries: consumer durables, basic industries, construction, finance, capital goods, textiles, and leisure. Four out of the seven industries identified here are also identified by the tests against \( H_{0}^{{(2{\text{nd}})}} \) presented in Sect. 5.

The findings reported in the first two points above are important, and in fact remarkable, in the sense that the two characterizing measures of the accuracy ranks persistence pattern appear to be stable over the sample period 1988–2006. This is valid not only for the overall dataset, but also for the data of individual industries. This may imply that the stochastic dynamic behaviors of analysts’ forecasting activities within the investment community stay remarkably stable throughout the sample years. From an analytical point of view, this stationarity enables us to drastically simplify our analysis by aggregating data and measured variables across the time dimension. In that case, we can focus our attention on the cross-sectional characteristics of individual industries and explore the driving factors that are peculiar to each particular industry.

As explained in the immediately preceding paragraph, we can now conduct more in-depth analysis of individual industries based on data aggregated over the period 1988–2006. In this case, we conduct a factors partition analysis employing the two-component Markov chain model. We estimated the proportional weight placed on the transient and the long-lasting component of factors and then computed the respective strength of each of the two components in steering the convergence rate of the average quintile rank for analysts who participate in an identified dataset. In such estimations, we fitted the function f(p,a,b) from Eq. 6 to the values of the average quintile rank reported in Table 2 using a hybrid adaptive splines algorithm (Lou and Wahba 1997). The results of estimation are displayed in Table 5.

Table 5 Estimated parameter values for a two-component Markov chain model (based on AFEs)

From the entries on the last row in Table 5, we observe that, for the overall dataset, the weight taken by the transient component, p, averaged over the two extreme initial quintile groups, is about 58%, while the long-lasting factors capture the remaining 42% weight. Further, the value of the parameter “a” is 2.38 for quintile 1 and 4.26 for quintile 5, meaning a steep exponential convergence ratio of e a = 0.093 and 0.014, for the transient factors, for the two extreme groups, respectively. On the other hand, the value of the parameter “b” is about 0.12, averaged over the two extreme quintile groups, implying e b = 0.887, a slow exponential declining rate for the long-lasting factors. In other words, the long-lasting component lasts more than ten times in duration that of the transient component, i.e., \( (e^{ - b} )^{10} = e^{ - a} \). Furthermore, the values of β1 and β2 calculated based on the estimated function f(p,a,b) are displayed on the last line, the right compartment of each of the two panels, of Table 5. They are about 0.45 and 0.85, again averaged over the two extreme quintile groups, respectively.

The same analysis was individually carried out for each of the thirteen industries and the results are separately tabulated in the main body of Table 5 for each of the top and the bottom quintile group. It is seen that the estimated values of the parameters a, b, and p for different industries vary markedly. This implies that the nature and the strength of factors included in the transient and long-lasting components that affect analysts’ forecast accuracy vary substantially among different industries. Several significant observations on the values of these parameters, as well as on the derived values of β1 and β2, are highlighted below.

First, among different industries, a relatively low value of “p” means weaker influences of the transient factors, or equivalently, stronger impacts of the long-lasting factors. Industries with relatively low p values are basic industries, capital goods, and utility.

Second, a smaller value of e a (larger value of a) coupled with a larger value of e b (smaller value of b) result in stronger—that is, second-degree—persistence. Again the three industries mentioned in the first point above serve as examples of this condition.

Third, a large value of β1 coupled with a large value of β2, the latter usually being close to 1.0, implies a strong first-degree Markov persistence. In addition, a large differential between the values of β1 and β2 enhances the second-degree Markov persistence.

Fourth, the estimated values of the parameters for the initial top quintile group (left panel of Table 5) and those for the initial bottom quintile group (right panel) are in fairly close agreement for the vast majority of the industries. This implies that the persistence is fairly symmetric on either side, that is, for superior or inferior performance.

7 Factors influencing analysts’ performance

To understand the factors that may explain accuracy differentials among analysts, we chose to conduct a special study that links analysts’ quintile rank scores for the last 3 years in our sample period (2004–2006) to characteristics of analysts and the firms they follow.Footnote 21 The quintile ranks score of an analyst in the three sample years is calculated by summing the quintile rank number of the analyst in each of the 3 years. For instance, an analyst whose average AFE is ranked in the first quintile for each of the 3 years has a total quintile ranks score of 3. For that reason, the range of an analyst’s quintile rank score for the 3 years is between 3 and 15, inclusively. We call this score SCORE-A (quintile ranks score based on average AFE).Footnote 22

In our attempt to identify factors that may explain analysts’ relative forecasting accuracy, we noted in the extant literature several variables employed in related work. In the research papers by Clement (1999), Jacob et al. (1999), Mikhail et al. (1997, 2003), Ghosh and Whitecotton (1997), Abarbanell and Lehavy (2003), and Stickel (1992), three variables were used. They are: (1) analyst’s workload, (2) analyst’s length of experience, and (3) the size of the brokerage house that employed the analyst.

Reasoning for the selection of these three variables is fairly intuitive. To achieve higher accuracy in earnings forecasts for a firm, an analyst will need to devote more time and efforts to analyze the firm. But a heavy workload in terms of the number of firms covered by the analyst will constrain the analyst’s ability to carry out the task for each firm thoroughly. Further, the longer an analyst’s experience in forecasting earnings for a firm, just like in most professional careers, the better the analyst is supposed to perform. Finally, the more resources and supports an analyst has the greater forecasting accuracy the analyst is likely to achieve. And it is often the case that larger brokerage houses are more resourceful.

In addition, Brown (1997) and Ho (2004) pointed out that an analyst’s forecast accuracy is inversely related to the capitalization size of the firm being followed. That is, analysts’ forecasts for larger firms tend to be more accurate than for smaller firms, holding other things constant. This is often because of the better quality and accessibility of financial information afforded by larger firms.

We also noticed that high growth firms tend to attracts more attention from major investors and prominent analysts. Closer scrutiny and wider dissemination of the firm’s financial information in the market place may help analysts achieve better forecasting accuracy.

To prepare for the empirical investigation, the variables discussed above that characterize analysts and the firms that they follow are defined and explained in details below.

FN: an analyst’s workload, proxied by the average number of firms followed by that analyst in the 3 year period. Specifically, it is the ratio of the total number of quarter-firm counts to the number of quarters for which the analyst provided earnings forecasts in the years 2004–2006.

ASIZE: the natural logarithm of the overall average of the capitalization sizes of the firms which were included in our data sample and followed by the analyst, computed at the beginning of each quarter, observed in the 3 year period.

SB: the size of the brokerage house which employed the analyst, proxied by the average number of analysts affiliated with the firm in the 3 year period, calculated at the beginning of each year.

AE: an analyst’s length of experience prior to the beginning of year 2004, proxied by the number of quarters from the first recorded participating quarter to the last in our sample data over the period from the second quarter of 1984 through the fourth quarter of 2003, ignoring discontinuity.Footnote 23

FGR: the average annual compound growth rate prior to the end of year 2003 in the capitalization size of the firms that were included in our sample and were followed by the analyst in the 3 year period.

Among the five variables described above, FN, SB, and AE are analyst-related, while ASIZE and FGR are firm-related. By involving both sets of influencing factors in our analysis, we in effect postulate an expansion of the analytic framework in the extant literature. The analysis provides a broader perspective to factors that influence analysts’ forecasting prowess.

We further note that, in order to be included in our sample for the calculation of the variables defined above, an analyst must fully participate in the forecasting activities. By “full participation” we mean that an analyst must provide at least one quarterly forecast for any firm in our sample in each of the 3 years under study. Analysts who do not meet the requirement are excluded from the sample.

Summary statistics of the variables listed above for our sample of analysts and related firms are provided in Table 6. There are 1,582 analysts meeting the full participation requirement and each carries a set of values for the variables SCORE-A,Footnote 24 FN, ASIZE, and SB. However, some of the analysts (171 in number) among the 1,582 do not have experience records prior to year 2004, in part due to our data trimming criteria as described in Sect. 2. Further, four analysts are involved with firms which do not have necessary capitalization numbers for the calculation of the variable FGR in our sample timeframe. This leads to the usable sample of analysts of 1,582 – 171 − 4 = 1,407.

Table 6 Descriptive statistics of analysts’ characteristics

Results of regression analysis for the overall dataset and each of the thirteen industries for the sample period identified above are displayed on the left panel of Table 7. In this part of the table, we display each estimated regression coefficient with its corresponding t-statistic embraced by the pair of parentheses immediately to the right. From the results for the overall dataset placed at the bottom row of the panel, we observe several significant phenomena. First, the heavier an analyst’s workload, in terms of the number of firms followed by the analyst, the worse the analyst’s performance, as suggested by the positive sign of the regression coefficient on FN. However, the corresponding t-ratio, 1.41, is not statistically significant at the 5% two-sided level.Footnote 25 Second, the longer an analyst’s experience is as recorded in our sample data, the better the analyst performs as indicated by the negative sign of the regression coefficient on the variable AE. But its t-ratio value, −1.76, is again not statistically significant at the 5% two-sided level. Third, the larger the size of the firm being followed, the more accurately an analyst can (on average) forecast its earnings, as is reflected by a very significant negative regression coefficient, with a t-ratio equal to −11.05, on ASIZE. Fourth, the size of the brokerage firm that employed the analyst does not materially affect the analyst’s relative performance as evidenced by the near zero t-ratio, 0.26, on the SB variable. And fifth, a firm with a higher average capitalization growth rate has a fairly significant favorable effect on analysts’ relative forecast accuracy as suggested by a t-ratio of −2.52 on the variable FGR. In summary, we note that the first three observations are consistent (two in correct signs, though insignificantly) with the bulk of findings reported in the literature, while the fourth is not, and the fifth is new.Footnote 26

Table 7 Regression and discriminant analysis of factors influencing analysts’ persistence in accuracy in thirteen industry groups and the overall dataset

Observations of greater relevance are found for analysts within each industry, for each industry provides a more homogeneous forecasting environment. As shown by the thirteen sets of regression coefficients displayed on the left panel of Table 7, the strength of each of the five explanatory variables does not apply evenly to all industries. The adverse effect of an analyst’s workload, FN, is most pronounced for firms in consumer durables, basic industries, textiles, services, and other. The firm size, i.e., ASIZE, effect is most significant for firms in consumer, basic industries, construction, capital goods, transportation, utility, textiles, services and other. Further, the capitalization growth rate, FGR, shows significant effects on analysts’ relative accuracy in earnings forecasts for firms in transportation and other. Interestingly, analyst’s length of experience, AE, as in the case for the overall dataset, does not show significant effect on analyst’s relative forecast accuracy in any of the thirteen industries.Footnote 27 The lack of beyond-border-line significant effect of the size of brokerage house, SB, is observed for each of the thirteen industries, as it is the case for the overall dataset. In summary, some of the markedly diverse observations on individual industries as we have noted above in this paragraph caution us against applying the findings from the overall dataset to each individual industry. Instead, they suggest that the effect of each of the analyst- or firm-related characteristics on analysts’ relative accuracy in earnings forecasts varies considerably in different industries.

For completeness of our discussion, we provide in Table 7 a column with the caption “Pr > F” that lists the p-value from the F test for the effectiveness of the entire regression equation in each case under study. The entries in this column show that the five-variable regression equation possesses significant explanatory power, at the 5% significance level, in nine out of the thirteen industries. The four industries that defy explanation by the regression equation are petroleum, food, finance, and leisure. Incidentally, the column with caption “N” lists the number of observations (analysts) involved in the particular industry under study. We remind the reader that an analyst can be involved in more than one industry. The analyst’s performance in each of the different industries is analyzed separately.

We further conducted a discriminant analysis to evaluate the ability of our analysis to identify superior or inferior analysts, measured by their relative average AFE, employing the five explanatory variables discussed in the immediately preceding three paragraphs.Footnote 28 We first identify two extreme groups of performers, in which the best performing group consists of analysts with the 3-year total quintile rank scores equal to 3, 4, or 5, while the worst group with scores 13, 14, or 15. The values of the five explanatory variables of the two groups of analysts were then used to construct and estimate a linear discriminant function. The calculated functional value of each analyst was then used as the marker of that analyst’s ability in accurately forecasting earnings. An optimal dividing point of the functional value was decided and used to cast each analyst in the sample into either the “best” or the “worst” group. The effectiveness of the discriminant function in correctly identifying an analyst’s group membership is captured by a test statistic, called “Wilks’ λ (lambda)”. A statistically significant Wilks’ λ value means that the chance of correct classification of an analyst’s group membership using the functional value exceeds that of random guessing supplemented by the knowledge of the relative size of the two groups.

On the right panel (the rightmost three columns) of Table 7, we display relevant analyst counts and the results from the discriminant analysis for each of the thirteen industries and for the overall dataset. The captions “N of Best” and “N of Worst” represent the number of observations (analysts) in the best and the worst groups, respectively, as previously defined. For instance, the results displayed on the bottom row of the table for the overall dataset show that there are 262 analysts in the “best” group and 273 analysts in the “worst” group. The value of the Wilks’ λ computed based on the covariance matrix of the five explanatory variables for the two groups in question has a p-value of less than 0.0001, indicating that the discriminant function works effectively in separating superior analysts from inferior analysts, much better than random guessing. Our analysis also reveals that the five explanatory variables provides a 72% correct classification for analysts who are in the ex post “best” group, and 55% correct classification for those in the ex post “worst” group (details not shown in Table 7). As the numbers suggest, the discriminant function does a better job in picking up good analysts than weeding out bad analysts.

The key test result of the discriminant analysis for each of the thirteen industries, the p-value of the Wilks’ λ, is displayed on the rightmost column in Table 7. The numbers reported therein show that the effectiveness of the discriminant function is statistically significant at the 5% level for ten out of the thirteen industries, closely matching the industries that reveal significant F-test statistics for the five-variable regression equation.

In summary, the findings reported in this section provide useful information about which analyst- and firm-related characteristics are significantly related to analysts’ forecast accuracy, in what fashion, to what extent, and in which industries.

8 An alternative scoring method and the empirical results

Our analysis thus far relies on the measure of annual AFE averaged over all of the firms followed by an analyst in a particular year. A potential shortcoming of that measure is its reliance on environmental factors that put analysts’ forecasting accuracy on uneven footing, such as the size and the complexity of the business, or businesses, of the firms being followed. As such, in order to assess the robustness of our results reported in the previous sections and, more importantly, to gain further insights into analysts’ persistence in relative accuracy in earnings forecasts, we employed an alternative measure that neutralizes all firm-related factors in the performance evaluation. The measure is to rank all qualified analyst’s AFEs for a firm in a particular year and convert the ranking into a sequence of fractional numbers equally spaced from zero to one.Footnote 29 , Footnote 30 Specifically, we define

$$ {}_{t - 1}R_{j,t}^{k} = \left( {{{{}_{t - 1}I_{j,t}^{k} - 1})\mathord{\left/ ({\vphantom {{{}_{t - 1}I_{j,t}^{k} - 1}{N_{t}^{k} - 1}}} \right. \kern-\nulldelimiterspace} {N_{t}^{k} -1}}} \right),\quad j = 1, \, 2, \ldots ,N_{t}^{k} $$

where \( N_{t}^{k} \): number of analysts providing forecasts for the earnings of firm k in year t; \( {}_{t - 1}I_{j,t}^{k} \): the rank of analyst j’s AFE, i.e., \( _{t - 1} AFE_{j,t}^{k} \), among all participating analysts’ AFEs for firm k in year t, usually from 1 to \( N_{t}^{k} \)(except for ties, which will be treated in the way as explained below), with 1 being the best, and \( N_{t}^{k} \) being the worst.

In the calculation for \( {}_{t - 1}I_{j,t}^{k} \), in case of a tie, we use the average of the ranks which otherwise would be assigned to the observations. For instance, when there are three analysts providing identical forecasts for a firm for each of the four quarters in a year, the \( {}_{t - 1}I_{j,t}^{k} \) value assigned to each of the three analysts is 2, the average of the three rank numbers, 1, 2, and 3, which would otherwise be assigned to the analysts should their forecasts be all distinct. In the special case where only one qualified analyst provides forecasts for the firm in a year, the value of the only analyst’s \( {}_{t - 1}R_{j,t}^{k} \) is set equal to 0.5. In the analysis of an analyst’s performance in a year, we average the R-scores across all of the firms that the analyst follows in that year. In short, in the new analysis we substitute the average AFE value in the analysis presented in the previous sections by the average R-score.

An advantage of this new measure of average R-score is that it neutralizes all firm-specific factors, which tend to vary from firm to firm, and instead focuses on analysts’ relative intrinsic abilities within the setting of a single firm. Potential significant drawbacks in this new measure exist too. Firstly, the dispersion of the R-scores for a firm depends on the number of participating analysts and is somewhat oblivious to the practical significance of the magnitudes of earnings forecast errors. For instance, when there are only two participating analysts in a firm, the spread of the R-scores between the two analysts is automatically 1, regardless of how trivial the discrepancy between the two forecast numbers is. On the other hand, a far-off-the-mark forecast by an analyst in a group of, say, 20 analysts will not result in severe penalty against the analyst, for the worst R-score the analyst will get is 1. Secondly, the measure based on the R-score tends to oversimplify and artificially space the difference in relative accuracy by a multiple of \( 1/(N_{t}^{k} - 1) \).

However, we recognize that at present time a completely satisfactory measure for analysts’ relative forecast accuracy is yet to be developed. Keeping this in mind, we present below results based on the R-score using the same dataset and methodology as reported in previous sections.

The frequency distribution of the number of qualified analysts who ever participate in a firm in our sample was collected. (For economy of space, it is not shown here.) From the data, we observed that about 30% of firms in our sample have three or fewer qualified participating analysts.Footnote 31 On the other hand, about 10% of sample firms have 27 or more distinct participating analysts over the sample years. We also note that the average number of analysts per firm is 10.7, over a total of 2,934 firms.

We replicated the analysis as presented in Table 3 for the overall dataset using the new measure and observed that the newly compiled one-step (1-year) transition probability matrix carries patterns remarkably similar to that in Table 3, albeit much weaker in the strength of persistence. The chi-square test statistic against the pure chance hypothesis, equal to 258.3, is still exceedingly significant. However, the new results on testing \( H_{0}^{{(2{\text{nd}})}} \), that is, the one-step Markov chain model, comparable to those in Panel D of Table 3, show few signs that suggest the overall dataset to carry the second degree Markov persistence in analysts’ relative forecast accuracy, as it is measured by the R-score.

We then redid Table 4 based on the R-score for the analysis of individual industries and present the results in Table 8. The entries therein (equivalent to Panel A in Table 4) reveal that for nine out of the thirteen industries at the 5% significance level, analysts’ relative forecasting accuracy maintains the property of “first degree Markov persistence” in the petroleum, consumer durables, basic, finance, capital goods, utilities, textiles, services, and leisure industries. However, the χ2 test results comparable to those in Panels B and C of Table 4 reveal that the “second degree Markov persistence” is no longer present in any one of the thirteen industries. This may not be surprising, because the new measure, as we have pointed out earlier, tends to be overly simplistic and carries certain distortive effects. Thanks to these drawbacks, detections of analysts’ performance differentials based on the average R-score have been made more difficult.

Table 8 Tests of conformity of observed frequency distributions of quintile ranks to the pure chance transition model (based on the R-score)

In short, in our view, the fact that analysts’ relative accuracy performance sustains the first degree Markov persistence in nine out of the thirteen industries based on the R-score measure is rather remarkable.

9 Summary and concluding remarks

In this paper we present a systematic analysis using a Markov chain model that casts new light on the nature, stochastic properties, and the level of persistence in analysts’ relative accuracy in earnings forecasts. The concepts of “first degree Markov persistence” and “second degree Markov persistence” are formulated to distinguish between two levels of performance persistence. The two levels of persistence are then linked to the two-component Markov chain model, which identifies the proportion and the strength of each of the two classes of “transient” and “long-lasting” factors that influence the superiority/inferiority of analysts’ performance. We also define and use the parameters β1 and β2 to characterize the convergence rate of an analyst’s average quintile rank—from an initial quintile rank toward the median rank—in successive observation years. These two characterization parameters are then used to study the stationarity of the transitional probabilities of quintile ranks. They are also used to examine the existence of the second degree Markov persistence in analysts’ relative forecast accuracy.

We find from our empirical results that analysts’ workload and the size and growth rate of the firms followed are among the long-lasting influencing factors. We also find that the strength of each of these long-lasting factors varies markedly from one industry to another. Overall, the results provided in this paper demonstrate that the spread and persistence of performance differentials among analysts is remarkably strong and deep-rooted in various industries. The results are also fairly robust to a more stringent (perhaps overly-rigid) measure, termed R-score, formulated to neutralize all firm-related effects.

In the context of the analysis presented in this paper, several issues warrant further attention and research.

First, in practice, there is no unified timing to submit earnings forecasts. It varies widely across analysts, quarters, and firms. A strict equal-based comparison of analysts’ accuracy is infeasible. A mechanism is needed to properly adjust accuracy for the timing of forecasting. Such a mechanism might hopefully improve the reliability of the analysis of relative forecasting accuracy.

Second, there are considerable turnovers of analysts in our data sample. The number of missing data points for the fifth observation year is highest for the initial top and bottom quintile groups. There are needs to track the paths of departure (promotions, dismisses, etc.) of the missing analysts. The information may cast more light on the ultimate intrinsic ability of an analyst and the consequences of being a superior or inferior analyst.

Third, there is a need to understand analysts’ incentives, or lack thereof, in achieving higher accuracy in earnings forecasts. Many other factors (besides the obvious ones explored in this paper) may explain why analysts’ performance tends to persist in a competitive market of investment information services. In an ideal market, competition should quickly weed out inferior analysts, if the sole criterion of job retention is accuracy in forecasting. In the same environment, superior analysts would also lose their edge quickly. But of course the reality is much more complex. What other factors are at work and how do they work in reality? An outline of discussions on this subject is available in Ramnath et al. (2008).

Despite the potential issues inherent in our analysis, this paper presents for the first time in the literature a broad-based analysis demonstrating that analysts’ abilities in earnings forecasts are indeed heterogeneous and that differentials persist over time. We also show that the strength of persistence and the potential influencing factors appear to vary from industry to industry. This provides a concrete base for future research work in the following directions.

First, accuracy of composite earnings forecasts can be improved by placing more weights on forecasts made by identified superior analysts, in contrast to the conventional equal-weighted consensus forecasts. Optimization algorithms can be designed for the tasks at hand.

Second, information gathering and processing in different industries conceivably requires different sets of skills and analytical capacities. The degree of difficulty and the intensity of work in earnings forecasting may thus vary across industries. How one would characterize and analyze these phenomena and how the related factors could affect information efficiency in the overall financial system are subjects of future research interest.