Keywords

1 Introduction

ESG investment refers to the investments in the following three elements in corporate analysis and evaluation:

  • “E” : Environment

  • “S” : Social

  • “G” : Governance

ESG investments have increased in recent years. According to the Global Sustainable Investment Alliance, which compiles data on the global balance of ESG investments, the balance of ESG investments in five major countries/regions (Japan, the United States, Europe, Canada, and Australia/NZ) was approximately $35.3 trillion as of 2020 (Fig. 1).

Fig. 1.
figure 1

Growth of the global ESG investment balance [1]

In particular, the investment balance in Japan increased six-fold from 2016 to 2020. Moreover, because Japan accounts for 8% of the global ESG investment, interest in ESG investment within the country has been growing significantly. Consequently, efforts have been made to institutionalize and mandate the disclosure of ESG-related information, which has been primarily disclosed on a voluntary basis. Japan requires the disclosure of ESG-related information in securities and corporate governance reports [2]. In particular, “integrated reports,” which convey important financial information about a company’s sustainable value creation and non-financial information including ESG in a single medium, has garnered significant attention. According to a survey by Edge International, Inc., 716 companies published integrated reports in 2021, compared to 599 in 2020 (an increase of 117) (Fig. 2) [3]. Consequently, there has been a growing interest in the analysis of integrated reports.

Fig. 2.
figure 2

Increase in the number of companies publishing integrated Reports [3]

This study focused on the banking industry owing to the following reasons. Financial institutions have a significant social impact, rendering their engagement with ESG very important. Banks have strong relationships with many stakeholders; hence, there is a high demand for ESG information. Furthermore, this industry is advancing toward institutionalizing and mandating the disclosure of ESG-related information; thus, it among the few sectors with readily available integrated report data.

Tazawa et al. [4] analyzed the integrated reports of universities, indicating their potential as valuable resources for new indicators in university evaluations. Kagami et al. [5] analyzed transitions based on information such as words and their frequency in the integrated reports of a company. They examined the change in corporate challenges with societal changes. However, to date, no research has visualized and compared the integrated reports of multiple companies. Although integrated reports have garnered attention, incorporating such information presents new challenges. Non-financial ESG-related information is often published as textual rather than numerical data. Moreover, integrated reports typically comprise dozens of pages per company, necessitating a considerable amount of time to read documents from multiple companies to understand overall industry trends. Therefore, this study attempted to understand the overall trends in integrated reports in a specific industry efficiently and visually using text mining. Focusing on the banking sector, this study reviewed the publication status of their integrated reports. Furthermore, co-occurrence network diagrams were used to visualize the ESG-related information emphasized upon by different banks and examine future focal points and challenges in investment. The visualization results are easy to understand by beginners in management studies and can potentially serve as educational materials. Consequently, this study explored the possibility of utilizing these findings in management and economics education through a survey.

2 Analytical Method

In this study, the Jaccard coefficient was used to calculate co-occurrence in the creation of co-occurrence network diagrams.

2.1 Jaccard Coefficient

The Jaccard coefficient J(A, B) for sets A and B is defined as follows (Fig. 3).

$$\begin{aligned} J(A, B) = \frac{|A \cap B|}{|A \cup B|} \end{aligned}$$
(1)

This is the proportion of intersection of a pair of terms with their unions.

Fig. 3.
figure 3

Jaccard Coefficient

Therefore, the Jaccard coefficient represents the proportion of elements wherein both specific terms are used among all elements that use either or both terms; the average value of the Jaccard coefficient is in the range of 0–1. A higher value indicates a greater proportion of elements using both terms, signifying a higher similarity between the two sets of terms used in the text. This study considered the Jaccard coefficient because it is calculated by dividing the number of common elements between two sets by the total number of elements in both sets (excluding duplicates). Thus, this coefficient, which does not consider duplicates between sets, is suitable for a co-occurrence analysis. Moreover, when handling large datasets in the analysis, the Jaccard coefficient was chosen because it can effectively clarify the relationships between different items. The Jaccard coefficient has several other advantages, including its ease of interpretation, generality, suitability to different types of data, and ability to fairly assess the overlap between two sets.

2.2 Tools Used in Analysis

Table 1 lists the tools used in this study.

Table 1. Tools Used for Analysis

2.3 Preprocessing

In text mining, preprocessing, such as morphological analysis, handling of stop words, and unification of notation variations, is necessary. These processes are described in detail as follows.

Morphological Analysis. Morphological analysis is a part of natural language processing. It involves breaking down the words (natural language) that we use daily into “morphemes” (the smallest units of meaningful expression) and classifying them according to their parts of speech. This is a string-extraction method for categorizing these elements.

Stop Words and Unification of Notation Variations. Stop words are generally excluded from natural language processing because they are common and not useful. In the unification of notation variations, processing such as standardizing the character type of words and absorbing spelling and notation variations is performed by replacing the words.

3 Analysis Procedure

3.1 Analysis Data

This study aimed to elucidate the trends in specific industries based on integrated reports. Therefore, integrated reports issued for 2016 and 2021 were selected to understand and analyze the differences in content over the years. Herein, 2016 was chosen as the analysis target as it represents a time before the increase in the number of companies issuing integrated reports and the balance of ESG investments, and before the outbreak of the COVID-19 pandemic. On the contrary, 2021 was chosen because the pandemic was present during this period, and thus it was of interest to analyze the different aspects focused upon by companies in their integrated reports. This study analyzed 74 banks/groups in 2016 and 87 banks/groups in 2021. These banks were divided into three categories: city banks, first-tier regional banks (local banks), and second-tier regional banks. A breakdown of the sample numbers is presented in Table 2.

Table 2. Breakdown of sample numbers

Banks that had provided integrated reports or “Disclosure Magazines” in PDF format on their websites were considered for analysis.

3.2 Analysis Process

The general outline of the analysis comprised four stages: 1) data collection, 2) data preparation, 3) visualization, and 4) analysis. This process is illustrated in Fig. 4.

Fig. 4.
figure 4

Analysis process employed in this study

Data Collection. PDF files of integrated reports or disclosure magazines were collected from the website of each bank and converted into textual data.

Data Preparation. Morphological analysis was conducted sentence-by-sentence on the converted text data using MeCab and NEologd. The text was divided into words, and only nouns were extracted. Different notations for the same noun were unified in the standard form. Common words, symbols, and accounting terms deemed unnecessary for analysis were excluded as stop words.

3.3 Visualization/Analysis

In the co-occurrence network diagram, if a pair of words was included in one sentence, it was considered a co-occurrence. Therefore, we searched for combinations of words that appeared in the same sentence. This was achieved by enumerating the combinations using Cartesian products. The degree of co-occurrence was calculated using the Jaccard coefficient. Jaccard coefficient values of 0.1 and 0.2 indicated relevance and strong relevance, respectively, as a guideline. A co-occurrence network diagram was prepared using the following three steps:

  1. 1.

    Addition of words as nodes in the graph.

  2. 2.

    Addition of only edges where the Jaccard coefficient exceeded the threshold.

  3. 3.

    Removal of nodes that were isolated and not connected to any other nodes.

The color of the nodes is presented in Table 3.

Table 3. Color coding of the co-occurrence network diagram

In addition, words that appeared more frequently were depicted with larger circles.

4 Results

The visualization results are shown in Figs. 5, 6, 7, 8, 9 and 10.

Fig. 5.
figure 5

City banks in 2021

Fig. 6.
figure 6

First group of regional banks (Regional Banks) in 2021

Fig. 7.
figure 7

Second group of regional banks in 2021

Fig. 8.
figure 8

Words Increased Compared to 2016 (City Banks)

Fig. 9.
figure 9

Words Increased Compared to 2016 (First Regional Banks (Regional Banks))

Fig. 10.
figure 10

Words Increased Compared to 2016 (Second Regional Banks)

The diagrams of the created co-occurrence networks show that each bank had its own approach to the ESG fields and management strategies. Moreover, there were notable differences in awareness. Specifically, city banks strongly focused on environmental issues among ESG-related matters and emphasized on digital transformation. In the first group of regional banks, characteristics such as consciousness of contributing to local communities and economies were observed (Fig. 5, 6, 8 and 9), whereas in the second group of regional banks, a lower frequency of ESG-related words compared to other groups and a lack of digitation-related words were noted, which suggested a delay in digitalization efforts (Fig. 7 and 10).

5 Survey

The visualization results are easy to understand, even by beginners in management studies, and they could be used as teaching materials for novices. Therefore, a survey was conducted with 10 students and bank employees from the Faculty of Management and Economics. The survey examined the visibility of the visualization results, comparisons between groups, and the educational utility. The survey results are summarized in Table 4.

Table 4. Survey Results

Based on these results, we can conclude that the overall evaluation was intuitive. In particular, the question on how helpful it was in understanding the ESG-related information received the highest average rating of 4.3 with a standard deviation of 0.41, indicating a very positive assessment. This suggests that ESG-related information is easier to understand following the conversion of vast textual data in integrated reports into co-occurrence network diagrams. Moreover, the average rating of 4.2 and a standard deviation of 0.43 for the question on whether the co-occurrence network diagram rendered it easier to learn about sustainable management strategies indicated that these visualization results could be used as teaching materials for beginners in management studies.

6 Conclusion and Future Works

This study visualized integrated reports in the banking industry using co-occurrence network diagrams to analyze ESG-related issues in the industry. Furthermore, based on the analysis results, we considered these materials as potential teaching materials for beginners and conducted a survey. The results confirmed the possibility of efficient and quick understanding of the focus areas of the industry as a whole from the vast amount of textual data in integrated reports. Moreover, the visualization results can provide guidelines related to management strategies and can be used as suitable teaching materials for beginners in management strategies. Future challenges in this field include the expansion of the categories for a more detailed analysis and the consideration of how to reflect the quality of integrated reports in the visualization.