Keywords

1 Background

Industrial classification is utilized not only in practical situations but also in various research fields. For example, industrial classification is used in areas such as benchmarks for company selection criteria and financial conditions. Indicators used as criteria are set in each country. For example, in Japan, we often use the Japan Standard Industrial Classification (20 in the large classification, 99 in the meddle classification) established by the Ministry of Internal Affairs and Communications (MIC), the Nikkei classification (36 in the middle classification, 256 class in the small classification) and TOPIX Sector Indices (10 in the large classification, 33 in the meddle classification). In addition, globally, we use GICS (Global Industry Classification Standard) as international indicators established by Standard & Poor’s and Morgan Stanley Capital International.

Traditionally, the existing industrial classifications constructed by various institutions give only one classification to one company, and do not permit overlapping industries. Moreover, the existing industrial classification is defined based on sales alone. However, in recent years, the business domain of companies has undergone major changes due to the increase of M&A and aggressive business transformation. Therefore, industrial identification is becoming more ambiguous than before. For example, in a diversified company, it is difficult to say that classified industrial identification accurately reflects the enterprise when classified business is lower in sales or profits than other businesses. With these backgrounds, there is a limit to describing one company only one industry and representing the real situation of a company. Therefore, a new industrial classification is required. This paper examines whether such a new industrial classification technique can be utilized as an alternative to traditional industrial classification.

There are at least two merits to assign one company several industries. Firstly, it is easier to compare segments with other companies. How to divide the business segments shown in the securities report has been left to the discretion of each company. Therefore, when we compare the segments with other companies, we think that it is possible to compare segments among several companies on a unified scale. Secondly, there is a possibility that we can conduct empirical analysis in more detailed. Now, the existing industrial classification systems assign one type of industry to one company. However, looking at the breakdown of one company, there is a limit to using sales as the standard, as in the case of high market share even though sales are low. We think it is possible to take into account such influences and to enable more robust control of industry characteristics.

The composition of this paper is as follows: In Sect. 2, we detail the previous research. In Sect. 3, we describe data employed in this analysis. In Sect. 4, we show our method. In Sect. 5, we show our results. In Sect. 6, we select to analyze for manufacturing industry in Japan Standard Industrial Classification and show our results. Finally, we summarize this paper and describe the issues and discussion of this paper in Sect. 7. Again, the purpose of this thesis is to explore the possibility of a new industrial classification technique.

2 Related Work

There are very few research papers on industrial classification. Regarding these studies, there are mainly two areas; the reliability of the existing system and the construction of new system.

In the United States, there are several studies about the reliability of industrial classification. For example, Elton and Gruber [1] stated that there is no guarantee that existing industrial classification form homogeneous groups, and the industrial classification method constructed using statistical methods may be more accurate. Hrazdil, Trottier and Zhang [2] analyzed which index, such as Standard Industrial Classification (SIC), the North American Industry System (NAICS) or the Global Industry Classification Standard (GICS) are effective and homogeneous for the group stocks with similar operating characteristics. Weiner [3] stated that as a result of performing cluster analysis based on the company’s financial information, the industrial classification constructed by himself was more accurate than the existing industrial classification. On the other hand, in Japan, there are very few studies about the reliability of industrial classification. Studies in Japan about the reliability of industrial classification have only been done by Kimura [4], Shintani [5], and Nakaoka [6].

Various methods have been proposed to address construction of a new industrial classification. For example, Sasaki and Shinno [7] and Ando and Shirai [8] have proposed a method of acquiring industry type information from web pages and classifying companies based on that information. Isogai and Dam [9] proposed methods of classifying companies based on stock price fluctuations. Peneder [10] classified the companies through the statistical cluster analysis and presented that the use of cluster analysis provided valuable tools for the industrial classification. Lee, Ma, and Wang [11] focused on “co-search” on the Internet and proposed a new method to recognize economically related peer companies.

Lewellen [12] builds an industrial classification that allows overlapping based on which companies the competitor are. There are Yang [13] and Budayan, Dikmen and Birgonul [14] as studies that classified companies using the clustering method. Yang [11] has built an industrial classification system that allowed overlapping by conducting latent class analysis for companies in Japan and Malaysia. Budayan, Dikmen and Birgonul [14] classified construction companies in Turkish by using clustering methods; K-means, self-organizing map and Fuzzy C Means. They set the number of clusters to 3 and they drew conclusions after they experimentally compare the three methods. As the result, they reported that the clustering methods such as self-organizing maps or Fuzzy C Means have a possibility to provide more valuable results.

However, in these papers, there can be pointed as three limitations. First, comparison with the existing industrial classification have not been conducted thoroughly. Second, the verification of validity of results have not been conducted thoroughly. Third, analysis using stock fluctuation depended on the heavy econometric model. Moreover, comparison with the existing industrial classification and verification of validity of results have not been conducted thoroughly. It remains to be seen whether cluster analysis that allows for overlapping can lead to more accurate results and whether reliability can be guaranteed.

The purpose of this paper is to construct a new industrial classification system that allows overlapping, and compare it with the existing industrial classification.

3 Data

We used financial data in 2016, for companies listed on the First Section of the Tokyo Stock Exchange. The number of companies analyzed is 1210. Data was obtained from Nikkei NEEDS. In this paper, we use the companies belonging to 17 industries among the large industrial classifications of the MIC. We exclude (1) companies that belong to compound service, (2) companies belonging to government except elsewhere classified, and (3) companies belonging to industries unable to classify. The reason why we exclude the three industries is that there no companies which belong to those three industries. In addition to this, we utilize the companies belonging to the manufacture industry, which is one of the 17 industries. These companies belong to the 23 industries among the middle industrial classification of the MIC. We excluded companies which belong to the manufacture of leather tanning, leather products and fur skins. The number of companies is 694.

The variables and their definitions used for performing overlapping cluster analysis and evaluating validity are in shown in Table 1 below.

Table 1. Descriptive statistics

We compared our new industrial classification technique by overlapping cluster analysis with the existing industrial classification. At that time, we used the Japan Standard Industry Classification (JSIC) as target industries. We obtained the data of the JSIC from Nikkei NEEDS.

4 Method

4.1 Cluster Analysis

In this paper, we adopted Fuzzy C Means (FCM) proposed by Bezdek [15] as in Eq. (1), as a way of classifying some companies. Normal cluster analyses demands that all data belongs to only one cluster. However, in FCM, by introducing a fuzzy set, it is possible to allow learning vectors to belong to two or more clusters. The centered algorithm is as follows:

$$ \begin{array}{*{20}c} {J = \sum\nolimits_{i = 1}^{N} {\sum\nolimits_{k = 1}^{K} {\left( {g_{ik} } \right)^{m} \parallel x_{i} - c_{k} \parallel^{2} } } } \\ \end{array} $$
(1)

In this paper, we set the initial value for cluster centers (\( K \)) to 17, which is larger than Budayan, Dikmen and Birgonul [14]. This number is the same as the MIC, except for three industries (financial or insurance industry, the complex service business, public service industry, and industry not classifiable). The distance (\( \left\| {x_{i} - c_{k} } \right\| \)) adopts the Euclidean distance. The degree of fuzzification (\( m \)) is 2. For the data (\( x_{i} \)), we selected four data; (1) operating margin which represents profitability, (2) the capital adequacy ratio which represents safety, (3) the total asset turnover which represents activity, and (4) the growth rate which represents growth. Definitions of these variables and descriptive statistics of the entire sample are shown in Table 1. As the result of the analysis, each company has a membership value (\( g_{ik} \)) for each of the 17 clustersFootnote 1. We rearrange the clusters in descending order of membership value. We set the highest membership value as the 1st industry. So the 17th industry has the lowest membership value. We conducted standardization before analysis. (see Fig. 1).

Fig. 1.
figure 1

Image after work

4.2 Verification Method

In Sects. 4.2 and 4.3, we propose two verification methods to confirm the validity of our proposed FCM classification. We compared the industrial classification newly created by FCM in the Sect. 4.1 with the existing industry classification. One is composite variance (Sect. 4.2), and the other is absolute prediction error (Sect. 4.3).

At first, we verified the reliability of industrial classification using the composite variance proposed in Amit and Livnat [16]. By using the composite variance, it is possible to compare the homogeneity of the group of companies.

The composite variance value (\( S_{a} \)) of the industry category \( \left( a \right) \) for a certain evaluation index \( \left( x \right) \) is calculated in Eq. (2).

$$ S_{a} = \frac{{\mathop \sum \nolimits_{i = 1}^{{N_{a} }} \left( {n_{ai} - 1} \right)V_{{x_{ai} }} }}{{\mathop \sum \nolimits_{i = 1}^{{N_{a} }} \left( {n_{ai} - 1} \right)}} $$
(2)
$$ \bar{X}_{ij} = \frac{1}{{n_{ai} }}\sum\nolimits_{k = 1}^{{n_{ai} }} {X_{aik} } $$
(3)
$$ V_{{x_{ai} }} = \frac{1}{{\left( {n_{ai} - 1} \right)}}\sum\nolimits_{k = 1}^{{n_{ai} }} {\left( {X_{aik} - \bar{X}_{ai} } \right)^{2} } $$
(4)

Here, \( S_{a} \) is the composite variance value in the industry category \( \left( a \right) \), \( N_{a} \) is the number of business group in the industry category \( \left( a \right) \), \( V_{{x_{ai} }} \) is the variance of the evaluation index \( x \) in the industry group \( i \) when classification of \( \left( a \right) \) is used, and \( n_{ai} \) is the number of firms in group \( \left( i \right) \) when classification \( \left( a \right) \) is used.

We calculated the ratio with the composite variance value \( S_{b} \) of another industrial classification \( \left( b \right) \). The dispersion ratio between \( S_{a} \) and \( S_{b} \) as shown in Eq. (5) follows the F distribution.

$$ S\left( {a,b} \right) = \frac{{S_{a} }}{{S_{b} }} $$
(5)

If this ratio is statistically significantly different from 1 in the F test, we can judge that there is a difference in homogeneity in both industrial classifications. If the variance ratio is statistically significantly greater than 1, which means that the denominator \( S_{b} \) is statistically significantly smaller, the industry classification \( S_{b} \) is evaluated to be more reliable than the industry category \( S_{a} \). We should remember that comparison by composite variance is not absolute but relative with respect to reliability.

In this paper, we evaluated by using the operating margin, the capital adequacy ratio, the total asset turnover, and the sales growth rate. These variables are used in FCM. We compare five of the first, second, third, fourth, and fifth industries of clusters created by FCM respectively with the JSIC. When we carried out the composite variance, we used values before standardization.

On calculating the composite variance value \( \left( S \right) \) of each evaluation index, it is greatly affected by outliers. Therefore, we calculated the composite variance value \( \left( S \right) \) by excluding outliers that are 1% above or below for each evaluation index.

4.3 Absolute Prediction Error

As the second verification method, we compared industrial classification using absolute prediction error (APE) proposed by Weiner [2]. Through the result of Sect. 4.1, we can use our industrial classification to select similar companies. We then compare the APE of these selected companies with the APE of companies selected under traditional classification. We use the APE of the enterprise value calculated using a multiple approach.

A multiple approach estimates the enterprise value of the firm by multiplying earnings with an enterprise value to EBIT multiple determined from a set of comparable companies.

The estimation for firm \( i \)’s enterprise value \( \widehat{EV}_{i} \) is given by

$$ \widehat{EV}_{i} = median_{{j \in C_{i} }} \left( {\frac{{EV_{j} }}{{EBIT_{j} }}} \right) \times EBIT_{i} , $$
(6)

where \( C_{i} \) is the set of comparable firms based on FCM, \( EV_{j} \) is firm \( j \)’s enterprise value, and \( EBIT_{j} \) is the firm \( j \)’s EBIT.

The valuation accuracy is calculated by the deviation between the estimated firm value and the real firm value. Therefore, we can calculate the \( APE_{i} \) for firm \( i \) as Eq. (7).

$$ APE_{i} = \left| {\frac{{\widehat{EV}_{i} - EV_{i} }}{{EV_{i} }}} \right| $$
(7)

\( \widehat{EV}_{i} \) is the estimated enterprise value for firm and \( EV_{i} \) is the observed market value for firm \( i \). We statistically test the results by performing a Wilcoxon rank sum test on the differences between our new industrial classification system and the existing method of industrial classification. If \( APE_{i} \) of our industrial classification was statistically significantly smaller than that of the existing industrial classification, it would indicate that the selection of similar companies works well and is responsive to a multiple approach. This result suggests that our industrial classification shows more homogeneous.

5 Result

5.1 Analysis of Fuzzy C Means

Figure 2 displays the results of cluster analysis, plotting operating profit ratio on the horizontal axis and capital adequacy ratio on the vertical axis. Since we show the result with two axes despite analyzing with four variables, it may be difficult to understand the result. However, it turns out that cluster number 8, 13, and 14 are clearly classified.

Fig. 2.
figure 2

Clustering result of Fuzzy C Means

Table 2 displays the industry to which each company belongs, with regard to using the first industry named in the FCM. We list three companies in each industry, in descending order of market capitalization in FY2016. We can observe that industry number 8 is a group of companies famous for their high capital adequacy ratio and industry number 17 is a high operating margin group. As in the above example, we specify the characteristics of some groups.

Table 2. Clustering result of Fuzzy C Means

5.2 Composite Variance Analysis

The results of comparing the reliability by the composite variance were as shown in Table 3. As a result of the analysis, all variables from the first industry \( S_{M1} \) to the third industry \( S_{M3} \) are statistically significantFootnote 2. From Table 3, we may observe that the denominator of the classification of the newly constructed industrial classification is statistically significantly less than 1 in all four financial indicators from the first industry to the third industry. Therefore, it indicates that our new industrial classification method forms a more homogeneous group than the JSIC. However, value of the fourth industry \( S_{M4} \) and the fifth industry \( S_{M5} \) are smaller than 1 at the total asset turnover and the sales growth rate. It means that the composite variance at the denominator is greater than that at the numerator. In other words, homogeneity is lower than the existing industrial classification with regard to total asset turnover rate and sales growth rate.

Table 3. The result of composite variance

In addition to these observations, looking at the values of the composite variance ratios in Table 3, the dispersion ratio decreases from the first industry to the fifth industry. This means that the variance of the industrial classification created by FCM, which is the denominator of the composite variance ratio, is large. Less homogeneity is observed as the number of industrial classification increases.

5.3 Absolute Prediction Error Analysis

We performed Wilcoxon rank sum test. Table 4 displays the result of APE. The number in the APE row shows the APE by the newly constructed industrial classification system and the APE by the JSIC. The number in the difference row means the difference between APE in each industrial classification system. A negative value indicates that our new industrial classification system has higher accuracy of corporate valuation than the JSIC. From the results, we found that the difference between the APE of our new industrial classification system and the APE of the JSIC was statistically significantly negative at the 10% level for the first industry and second industry. The APE of the first industry and second industry is smaller than that of the JSIC. Therefore, we perceived that the first industry and the second industry which we newly constructed by FCM has higher homogeneousness than industries based on the JSIC. However, it is not statistically significant after the third industry. From Table 4, we see that the \( APE \) of the newly constructed industrial classification increases as the number of classifications increases. In other words, this shows that homogeneity is lost as we proceed down the industry ranking.

Table 4. Result of absolute prediction error

6 Manufacturing Industry

In Sect. 5, we constructed our new industrial classification and verified its validity for all industries. Next, in this chapter, we focused on the manufacturing industry in large classification of the JSIC, classified them through our FCM method, and compared the result with the existing industrial classification. In FCM, we set 23 to the number of clusters, since this number is the same number of manufacturing industry. The other settings for the parameters are the same as in Sect. 4.1. We also apply the same verification methods; composite variance method (Sect. 4.2) and absolute prediction error (Sect. 4.3).

6.1 Absolute Prediction Error Analysis

Table 5 shows the result of composite variance for manufacturing industry. From Table 5, we found that the denominator of the classification of the newly constructed industrial classification system is significantly less than 1 in all four financial indicators from the first industry to the third industry constructed by FCM. Therefore, it suggests us that the newly constructed industrial classification system provides more homogeneous groupings of firms than the existing industrial classification. However, according to sales growth rate of the fourth industry, total asset turnover rate and sales growth rate of the fifth industry, the value of them are less statistically significantly larger than 1. It means that the composite variance at the denominator is equal to the numerator. As a result, our newly constructed method is comparable to those existing classification methods.

Table 5. The result of composite variance

6.2 Absolute Prediction Error Analysis

We performed Wilcoxon rank sum test. Table 6 shows the result of APE for manufacturing industry. The number in the APE row shows the APE by the newly constructed industrial classification system and the APE by the JSIC. The number in the difference row means the difference between APE in each industrial classification systems. A negative value indicates that our new industrial classification system has higher accuracy of corporate valuation than the JSIC. From the result, we found that the difference between MM1 and N is statistically significantly negative in the first industry. The APE of the first industry is smaller than that of the JSIC. Therefore, we perceived that our newly industrial classification system is gathering similar companies than JSIC’s companies in the first industry. However, from the result of second industry to that of fifth industry, our classification system is statistically significantly positive. We confirmed that the JSIC has higher similarity than our classification system from the second to fifth industries.

Table 6. Result of Absolute Prediction Error for manufacturing industry

Compared to the result of Sect. 5.3, the number of industries constructed by FCM, which are more homogeneous than the JSIC, has decreased from 2 to 1. However, looking at the value of APE in the first industry our newly constructed, the APE of this result is smaller than the APE of Sect. 5.3. This result suggests us that the exacter we classify the companies, the smaller the error between the actual enterprise value and the enterprise value based on multiple approach would be.

7 Conclusion and Further Discussion

In this paper, we analyzed whether it is possible to construct a new industrial classification system by using FCM. The result of composite variance and APE for all industries in the .JSIC indicated that FCM is effective in making homogeneous industrial clusters. The result of analysis for the manufacturing industry in the JSIC shows us that there is a possibility of classifying companies more homogeneously by dividing them exactly. FCM is effective in constructing the new industrial classification system. Moreover, Nikkei NEEDS shows up to three industries for each company. Our three industries made by FCM are consistent with Nikkei NEEDS through composite variance analysis. As a conclusion, FCM is one of the tools to assign multiple industry to one company. Through the industrial classification with our new proposed FCM methods, it may be possible to express the company’s situation and it may be useful for us to correctly perceived company’s reality.

Finally, we would like to point out three limitations in this paper. Firstly, it is necessary to conduct further analysis that might enable our FCM to be more effective in wide range of clusters. Since this paper followed the methods used in previous research, the validation method is not necessarily suitable for FCM. Secondly, with regard to the cluster analysis, it is not always possible to obtain the same result each time when classifying using FCM. Therefore, it is necessary to verify the robustness of classification result by repeating it multiple times. Finally, in this paper, we classified all companies into 17 groups based on financial data. However, it is difficult to find financial features individually for each cluster, based only on the four types of financial indicators. We plan to apply other indicators to construct our classification method for a deeper understanding of economic features of each cluster.