1 Introduction

Cultural industries usually share two common features (Caves 2000): an oligopolistic market structure with a competitive fringe, and sales highly concentrated on a small number of products, mainly because of the superstar effect (Rosen 1981) or “winner-take-all” phenomenon (Frank and Cook 1995). According to the long-tail effect (Anderson 2004, 2006), digitization in production, distribution, promotion and consumption should reduce the sales concentration for cultural products. But does digitization also impact the market structure of cultural industries? The impact of e-commerce on the distribution of sales has been the subject of much debate, but its effect on the strength of competition among publishers has been neglected. The present paper contributes to filling this gap in the literature.

According to Anderson (2004, 2006), the superstar effect tends to be offset by a “long-tail” effect in the digital era. Three forces tend to shift demand from the most popular products (the head of the sales distribution) to niche products (the tail). (1) The decrease in production costs increases the variety supplied and thus increases the length of the tail. (2) The constraints of physical shelf space disappear and distribution costs decrease drastically. Therefore, consumers have easier access to niche products. This fattens the tail because some products will find an online audience sufficient to ensure their profitability. (3) Furthermore, new ways of connecting demand and supply through Web 2.0 (Facebook, Twitter, blogs, forums, recommendation tools, etc.) improve matching between supply and demand by reducing search costs. Consumers may have better knowledge of products closer to their “ideal variety” than those that are highly promoted in the traditional media. Consequently, demand should switch from the mainstream products at the head of the demand curve (the hits) toward a huge number of niche products in the tail.

This prediction is widely debated, and both theoreticalFootnote 1 and empirical works provide conflicting evidence about the existence and the magnitude of the long tail (see Table 1 for a survey of the empirical literature). As far as the publishing industry is concerned, Brynjolfsson et al. (2003, 2010), studying US book sales on Amazon, conclude that the concentration of sales did indeed decrease over the 2000s. In France, Bounie et al. (2010) found that the distribution of sales is less concentrated online than offline. Conversely, Benghozi and Benhamou (2010), working on a sample of French book sales, found that the long-tail effect remains very small online. However, these studies are based on rather limited samples (Benghozi and Benhamou 2010), on bestsellers only (Bounie et al. 2010) or on a mere estimation of sales (Brynjolfsson et al. 2003, 2010). Peltier and Moreau (2012) use comprehensive data over the period 2003–2007 and distinguish online and offline sales. They show that a long-tail effect exists in the French publishing industry with consumers shifting somewhat from bestsellers to medium or low sellers.Footnote 2 Farchy et al. (2013) provide an overview of the various impacts of digitization on the book industry: categories of works available, changes in consumer uses and cultural diversity. However, in all the papers above, a question is left unanswered: how digitization affects the market structure of the book industry?Footnote 3 Does it favor small publishers from the fringe or dominant firms? Put in other words, what is the profile of the firms whose product sales grow when sales increase on Internet distribution channels? These questions constitute our original contribution.

Table 1 Survey of the main empirical studies on the long-tail effect

More precisely, we study which publishers benefit from the long-tail effect. Is this effect concentrated on small independent publishers (possibly specialized in niche products) or on big publishers and their subsidiary firms, or does it affect all kinds of publishing houses? The answer is far from straightforward. E-commerce might indeed favor limited audience books, but only those produced by large publishers. This would be the case if e-commerce generates new types of fixed costs that only large firms can afford. Do notice, however, that while individual publishers’ strategies are key determinants of our results, we do not observe them and focus on their aggregate effect on the publishing industry market structure.

The paper is organized as follows: Section 2 introduces our research questions. Section 3 describes our empirical methodology. The results concerning the effect of the long tail are presented in Sect. 4. Section 5 briefly concludes by recalling the main findings and opening avenues for future research.

2 Research questions

Over the period 2004–2010, the French book market remained a duopoly with a double fringe.Footnote 4 Two companies dominated the industry: Hachette and Editis. In 2010, their respective turnovers were 2165 and 751 million euros. On the French market, their turnovers were about the same; the difference was mostly due to the very strong position of Hachette in foreign countries (especially the USA with its subsidiary Grand Central Publishing acquired from Time Warner in 2006). The first fringe comprised medium-sized groups (in particular, La Martinière, Gallimard, Flammarion, Albin Michel). The second fringe comprised small and very small publishers.Footnote 5 Did the development of IT over the period allow medium-sized and small companies to benefit from a long-tail effect by increasing their market share?Footnote 6 To address this issue, we study whether the long-tail phenomenon observed in the French book industry (Peltier and Moreau 2012) favors the “competitive fringe” of publishers. We therefore pose two research questions.

2.1 R1: Is the concentration of the book market weaker online than offline?

Digitization may lead to a lower concentration of the market through three effects. First, Internet favors the entry of new publishers into the market by reducing distribution costs. Second, online methods of marketing (Facebook, blogs, recommendation tools, etc.) are more open to small publishers. Although it is costly to reach the top of Google search ranking, the possibility of reaching consumers and capturing their attention is higher. Facebook pages, twits, etc. allow the production of information and the building of reputation, generating network effects. Third, by improving the match between supply and demand, recommendation systems and online word of mouth should favor small publishers who often offer niche products, look for new talent and try to identify “gaps” in the supply of big publishers.

2.2 R2: Does the difference between online and offline concentration disappear over the period?

Two scenarios are equally plausible for the evolution of the difference between online and offline market concentration. First, big companies may succeed in adapting their supply over time. They improve their promotional methods on Internet, join social networks, improve their ranking on Google search results by buying “clicks,” etc. They also produce more and more niche products. Moreover, they may buy small pure-player firms with skills in digitization. In this way, they can capture a large part of the long-tail effect. The second possibility depends on the ability of newcomers and independent publishers to defend their relative advantage with specialized or risk-taker readers and writers. In this case, big companies leave the long tail to independent publishers and try to increase the best seller (winner-take-all) effect, in which case the difference in concentration may increase over time. Alternatively, even if big companies succeed in using Internet to promote their low-seller books small publishers could prove more efficient in achieving this task. Our analysis produces some evidence to evaluate the respective likelihoods of these two scenarios.

3 Empirical methodology

3.1 Data

To capture the effect of the long-tail phenomenon on the relative market shares of different publishers,Footnote 7 we use a comprehensive database of annual sales of physical books by publisher over a period of seven years (2004–2010) obtained from the French subsidiary of the GfK group, one of the world’s leading market research organizations. GfK tracks all book sales in almost all outlets in France.Footnote 8

In 2010, GfK’s panel included more than 3500 offline and online shops.Footnote 9 Although the number of shops taken into account significantly increased over the period, the extrapolation method used by GfK ensures the representativeness of the panel at the national level. Data provided by GfK focus on two genres: comic books and literature. Literature (including novels, poetry and nonfiction) is the leading segment of the French book market, accounting for 25 % of units sold in 2010 (SNE 2011), and is usually considered the most emblematic genre of the book industry. Comic books represent a smaller market with around 9 % of total units sold in 2010.

Within the database provided by GfK, data can be broken down by channel of distribution. This allows comparisons to be made between online (Amazon …) and traditional sales channels (bookshops, large stores specialized in cultural products, supermarkets). Data on e-books are not reported, but they still remain marginal in France. Digital book sales represented 1.8 % of the book market in 2010 (SNE 2011). Furthermore, to avoid the risk of including the same title twice in the database—and thus overestimating the number of books sold—we have excluded paperbacks.Footnote 10 Since GfK could not distinguish between books whose first edition was in paperback and books that were only reprinted, the former are also excluded from the database.

Our database contains more than 170,000 different titles published by about 4000 publishers over the whole period. In total, 78.4 % of these different titles were literature books and 21.6 % were comics. This yields a sample of more than 400 million copies sold. Online sales, in units, rose from 1.8 % of overall sales in 2004 to 6.6 % in 2010. Data provided by GfK allow us to know accurately how many copies of each of the 170,000 books have been sold each year in the two distribution channels (offline and online). In this paper, book sales are analyzed at the publisher level. Thus, we have gathered the annual sales of copies of all books released by a given publisher in a given genre and in a given distribution channel (offline or online). Tables in the “Appendix” present the main descriptive statistics of the database used in this paper.

3.2 Methodology

To test the difference in sales distribution between the Internet and other channels, following Brynjolfsson et al. (2011), we estimate the Pareto curve for sales by publisher. The equation for Internet and offline data is the following:

$$\ln ({{Sales}}_{j}^{i} ) = \beta_{0}^{i} + \beta_{1}^{i} \ln ({{SalesRank}}_{j}^{i} ) + \varepsilon_{j}^{i}$$
(1)

where \({{Sales}}_{j}^{i}\) denotes the level of sales for each publisher j over the period in the distribution channel i (offline or online). SalesRank i j is an ordinal ranking of the frequency of sales of each publisher j in the distribution channel i. In this setting (model 1), β i1 measures how quickly the sales of a given publisher in a channel decrease as the sales rank rises. The more strongly negative β i1 is, the higher the market concentration.

If Internet sales of books by publisher are less concentrated than offline sales, we would expect β i1 to be less strongly negative (i.e., lower in absolute value) in the Internet channel than in the conventional channel. This reflects the idea that low-selling publishers (i.e., publishers that obtain higher ranksFootnote 11) obtain a larger share of sales in this channel. To test whether the β i1 coefficient is significantly less negative for the Internet channel than for the physical channel, we pool Internet and offline data into one dataset (Brynjolfsson et al. 2011). Thus, the linear regression we estimate is the following:

$$\ln ({{Sales}}_{j}^{i} ) = \beta_{0}^{i} + \beta_{1}^{i} \ln ({{SalesRank}}_{j}^{i} ) + \beta_{2}^{i} {{Internet}}_{j} + \beta_{3}^{i} {{Internet}}_{j} \times \ln ({{SalesRank}}_{j}^{i} ) + \varepsilon_{j}^{i}$$
(2)

Internet is a dummy that indicates whether an observation is for the Internet and we introduce the interaction term between the variable Internet with ln(SalesRank). A positive value for β i3 would indicate that the market concentration is lower online than offline.

However, in our database, this lower concentration of sales online may be a pure artifact. It could be explained by several biases. A first bias is related to the segment of books considered (comics or literature books). If the sales of one segment are less concentrated than those of the other segment and at the same time relatively much more important online than offline, online sales would appear less concentrated. We therefore include in our regression an interacted variable SalesRank × Comics, where Comics is a dummy variable equal to one if the distribution of sales refers to comics books and zero if it refers to literature books.

Likewise, a second bias could be due to temporal specificities over the period. We therefore introduce a continuous variable Year (ranging from 2004 to 2010) as well as the interaction Year × SalesRank.

We can also imagine that the number of titles available each year in both channels might mechanically affect the distribution of sales. For instance, if more different titles are sold on the Internet than offline, concentration by publisher could appear lower online. To check whether a long-tail effect is not only due to the fact that more references are directly available online, we include the variable Titles (which is the log of the number of titles sold per year for each genre and each channel of distribution) as well as the interaction Titles × SalesRank.

Another bias could be related to the specific life cycle a book usually experiences. When a lot of new titles are released during a year, sales are spread over a greater number of titles. If consumers buy more new titles when they use the Internet channel, the concentration of sales by publisher will mechanically be lower in this channel. The interaction News × SalesRank, where the variable News equals the log of the number of new titles that have been released during the same year, allows us to control for this bias.

The difference between online and offline sales concentration might also be explained by the prices charged for books in each channel. In the French case, this bias seems unlikely, because of the “fixed price agreement.” The price of a book is decided by the publisher and is uniform for all retailers. However, as retailers can grant a 5 % discount, we introduce the mean priceFootnote 12 for each publisher (Price) and the interacted variable SalesRank × Price in our regressions.

The long-tail effect may also result from a higher increase in the number of publishers online than offline over the period, which would naturally lead to a lower concentration of the market on the Internet channel. As Table 2 shows, the number of publishers did indeed rise faster online than offline, for both literature and comic books.

Table 2 Evolution of the number of publishers by genre and channel of distribution (2004–2010)

To check whether the long-tail effect is not only due to a higher increase in the number of publishers who sold online than offline, we also introduce a control variable SalesRank × Publishers, where the variable Publishers is the log of the number of publishers who have sold at least one copy (for a given year, a given channel and a given genre).

To assess whether the long-tail effect is not just a temporary phenomenon, we add to the above model the “Internet × SalesRank × Year” variable (model 3). To answer our second research question, “Does the difference between online and offline concentration disappear over the period?”, we observe the coefficient for this interaction variable. If it is negative, we conclude that the difference in concentration vanishes over the period.

A drawback of the above models is that they do not shed any light on the evolution of the market shares of the various types of firms presented above: duopoly, first fringe and second fringe. A lower concentration could be due to increased market share of the smallest publishers (second fringe), a rise in the medium-sized publishers (first fringe) at the expense of the dominant firms (duopoly) or to a mere reallocation of market shares between these duopolistic firms. To address this issue, we study the market share of the 2, 4, 10, 20 and 50 biggest publishers (CR2, CR4, CR10, CR20 and CR50) for each distribution channel (online vs. offline) and for each segment (literature vs. comics).

4 Results

To conduct our analysis, we construct twenty-eight subsets pooled in our data according to genre (comic books or literature), the distribution channel (online or offline) and a given year of the 7-year period: 7 years × 2 genres × 2 distribution channels.

4.1 Is the concentration of the book market weaker online than offline?

We first study the difference in market concentration between both online and offline channels without any control variables (Table 3).

Table 3 Pareto curve estimates in value [14 subsamples pooled for both Internet sales and offline sales—publishers (LR)]

Results suggest that sales by publisher decrease more slowly as the rank increases in the Internet than in conventional stores. Model 1 in Table 4 provides a test of the significance of this result. The coefficient of SalesRank × Internet is, as expected, positive and highly significant at the 1 % level. Our first hypothesis is thus supported: The market concentration is lower online than offline.

Table 4 Pareto curve estimates in value [28 subsamples pooled—publishers (LR)]

Model 2 in Table 4 provides a robustness test of this result by controlling for the diverse variables (included the number of publishers) that could impact the online market concentration. The coefficient of SalesRank × Internet remains positive and highly significant at the 1 % level.

4.2 Does the difference between online and offline concentration disappear over the period?

However, the difference in market concentration between online and offline sales may be temporary. Over time, dominant publishers might have adapted their strategies to improve their market share on the Web. Likewise, on the demand side, the predominance of early adopters of e-commerce with niche tastes that small publishers supply better might vanish with the increase in the number of consumers who purchase books online.

Model 3 in Table 4, which introduces the interacted variable Internet × SalesRank × Year, does not support this scenario. Indeed, the sign of the coefficient is significantly positive. Thus, our results highlight an ever wider difference in concentration between online and offline sales over the period. This result suggests that as yet, the dominant firms have not adapted their strategies to the online market sufficiently to maintain the market share they enjoyed in bricks-and-mortar retailers.

4.3 Is there a significant role for small publishers in the online market?

We have shown that market concentration in the book industry is lower online than offline, at least for the two segments studied. We have also found that this difference does not disappear over the period 2004–2010. Table 5 allows us to shed further light on the comparative level of market concentration online and offline and its evolution over the period.

Table 5 Concentration ratio online and offline, literature and comics, 2004–2010

It turns out that the lower market concentration observed online compared with offline is related to a loss in the market share of the duopolists and of firms at the top of the first fringe (up to 10th rank). On average, over the period and for both literature and comics, the top ten firms perform less well online than offline. The duopolists experience the biggest fall, while the gain in market share is obtained by second fringe firms (ranking above 50th). In keeping with the long-tail hypothesis, the smallest publishers seem to particularly benefit from the rise of the online market. It allows them to overcome the disadvantage of the limited space devoted to their books in conventional retail stores. In France, the distribution networks supplying booksellers belong to the largest publishers. Their bargaining power is thus much higher with conventional retailers than with online retailers such as Amazon. Moreover, the online market is probably less favorable to firms that rely on traditional marketing campaigns to promote their books. Conversely, small publishers, who usually promote less popular authors, are less disadvantaged when using online promotion and recommendation tools. However, it is interesting to note that the differences in concentration between online and offline markets are wider for the literature segment than for comics. On average over the period, in the literature segment, the online market share of the duopoly was 7.5 % below their offline market share, whereas the firms above the 50th rank had a total online market share 7 % above their offline level. In the comics segment, these two figures were −2 and +1.4 %, respectively.

Table 5 also allows us to better understand the dynamics at work over the period. Whatever the market segment and the channel of distribution, market concentration tended to decrease over the period 2004–2010. The four leading firms lost about 10 % of their market share, while two types of firms benefited from this weakening, depending on the channel of distribution and the segment considered. Both online and offline and both in literature and in comic books, the firms of the first fringe (between 5th and 10th rank) enjoyed growth in their market share between 2004 and 2010. For literature books, the gain was +6.5 % on the online market and the increase was 9.2 % on the offline. For comic books, these figures were +4.9 and +7.4 %, respectively. The second type of firms that benefited from the weakening of market leaders was the firms of the second fringe. But this result only holds online and for literature books (+6.8 %). The impact of IT on market concentration in the comic segment is quite counterintuitive. Since comic book readers are usually younger and more familiar with digital technology, we would have expected the development of online distribution and recommendation tools to lead to a larger fall in concentration for comic books to the benefit of small publishers able to match young consumers’ preferences. Our results show that consumers’ tastes still focus on best sellers, both offline and online. The importance of mimetic behavior, reinforced by the broadcasting of TV programs that are derivative products of best-selling comic books, probably partially explains this result.

5 Conclusion

The paper contributes to the empirical studies on the long-tail effect by analyzing its consequences on the market structure of the book industry. Four main issues are highlighted. First, a long-tail effect exists when sales are counted by publisher. In the French book industry, the lesser concentration of sales online versus offline (Peltier and Moreau 2012) goes hand in hand with a deconcentration of the market structure. Second, this trend is more and more evident over the period 2004–2010. Third, the type of firms that benefit from the erosion of the leaders’ market share, both offline and online, depends on the segment of publishing activity considered. In the comics segment, it is the first fringe of publishers that most benefit from the dominant firms’ loss of market share online, while in the literature segment it is the smallest firms of the second fringe that most benefit. Finally, we show that the rise of online sales does not drastically change the relative level of concentration when winner-take-all habits are frequent (as in the case of comics).

Further research is needed on at least two issues. First, it would be interesting to investigate the specific role of independent publishers in the top 100 sales. This would show whether the increase in the market share of these publishers is due to a rise in the sales of long-tail books alone, or if the position of the firms at the core of the oligopoly is also undermined for best sellers. Second, it would be interesting to test the long-tail effect on the e-book market, which has developed enormously since 2007, especially in the USA.Footnote 13