1 Introduction

The basic unit of university organization is the financial information management. It is the core of transaction management and the important guarantee to maintain the normal operation of the school. The school departments, units and personnel are closely linked together. The amount of data that must be studied in the study of data. The university can collect 2.25 × l021 bits of data in only 1 year. The rate of increase continues to increase by 60%. The acceleration of large amounts of data and the rapid progress of information technology have brought the management of financial information to scientific and reasonable levels to a whole new level. Combining big data technology, data mining, decision tree algorithm and business intelligence, new technologies are applied to the construction and management of colleges and universities to achieve scientific decision-making. This is the primary development goal at present. With the development of computer technology, network technology and financial business process technology, the massive data of financial information system has formed a huge data analysis factory. Through the rational use of massive data, the idea of data analysis and business intelligence (business intelligence, BI) is effectively combined. Financial management ideas and business intelligence are integrated. Data analysis and decision making is regarded as the basis for improving the scientific and rational financial management of colleges and universities. This will become a hot research content in the research content of domestic and foreign researchers [1].

2 Research on the Technology of Financial Management and Decision Making in Colleges and Universities

2.1 The System Architecture of the Financial Management and Decision-Making System in Colleges and Universities

At present, data information technology and business intelligence technology have gradually begun to be applied to colleges and universities in the financial information management system. School financial data resources are integrated to make decision analysis, and management efficiency is improved. The BI project proposed by the State University and the University of Michigan. The University uses the BI project to integrate the financial data and construct the decision support system. Purdue University is responsible for building the school’s new collaborative integration system, and the BI construction road map was also proposed. In 2014, the BI structure and program were approved, and construction of the project started in 2015. Among them, the construction of a more mature BI system has been completed. Through the use of business intelligence technology, data generated by colleges and universities are deeply studied. This research covers not only data management, data warehouse and other computer technology related content, but also involves the use of university information resources, performance evaluation, student analysis report. The research results all come from the business intelligence system data [2, 3]. Figure 1 shows the university BI system architecture.

Fig. 1
figure 1

University BI system architecture

2.2 Construction of Data Warehouse for Financial Management and Decision-Making in Colleges and Universities

“Data warehouse” is put forward for the first time by BillInraon. For the concept of data warehouse, it gives the following description: Data warehouse is a process of data collection, which consists of integration (Integrated), SubjectOriented and Non-Upda1:able. TimeVariant [4] data is aggregated over time. It helps managers to conduct business analysis and decision analysis. The data warehouse provides useful information for managers to make decision analysis. Data warehouses include data sources, preparation areas, databases, data marts, data knowledge mining libraries and various management tools and application tools. After the data warehouse is built, the first step is to extract the data into the data preparation area for the purification of the data. Then the processed data is loaded into the data warehouse. Finally, the data is stored in the data mart according to the different needs of different users. Through OLAP data warehouse, users make data decision analysis or knowledge inquiry about data warehouse data [5] (Fig. 2).

Fig. 2
figure 2

Data warehouse architecture

Data warehouse is a central repository based on specific data structure and related applications, which provides consistent data sources for analysis and reporting. The model of university financial data warehouse system creates a whole work process, statistical analysis of data from the data to support decision analysis [6]. Figure 3 is a data warehouse architecture diagram for the financial management and decision-making system of colleges and universities.

Fig. 3
figure 3

Data warehouse architecture of financial management and decision-making system in colleges and universities

The data analysis system mainly includes the analysis and collection of data, the construction of data warehouse and the establishment of the information center [7].

Data acquisition: The data collection includes the receiving and processing of data, the inspection of data and the loading process of the interface data.

Data warehouse: It is completed by the two ETL process tool. The first ETL: The first ETL data integration is theme-oriented and is a process of integrating from a data model library to an analytic model library; The second ETL: The second ETL is a message-oriented process for analyzing topics. It transforms from a base model library to an analytic model base.

The information center mainly completes the transformation from the data to the information. This process mainly includes: Data layer: data mart, data mining, etc. Business logic layer: the application layer of statistical analysis is called the business logic layer, such as impromptu query, data mining, application of thematic analysis and so on. The interface layer completes the financial information display.

System management includes: Metadata management includes metadata, interfaces, metadata of index management, metadata of process management, metadata of process management, and so on. It is an implementation process of data updating and data retransmission. The duty of security management is to manage permissions for reporting and data warehousing and AP, as well as log management, user authorization management and unified authentication management.

The responsibility of data quality management is to deal with the related problems of data when loading data in data warehouse for the first time and transforming data into new data format needed by the application system for the first time. For example, the situation of data includes: data integrity, default data management, data range, data type, data relevance, data quality control and other issues.

System parameter configuration: The configuration of service parameters is to set the threshold for key business parameters, complete the early warning function, and implement and submit the early warning information report. It provides parameters for the database connection, and the report upload interface provides configuration function parameters and data source interface parameter configuration functions.

The task of scheduling management is to complete the implementation of the financial system from data acquisition, extraction and data conversion, data loading and data updating and data retransmission steps.

The responsibility of security management is to manage the rights of reports, data warehouses and APs. It also includes the management of the system logs, the authorization management of users, and the management of unified authentication. The responsibility of data quality management is to deal with data-related problems when first loading the data in the data warehouse and when converting the data to the data format required by the new application system for the first time. For example, data appear in the following aspects: data integrity, default data management, data range, data type, data association, data quality monitoring and so on.

The configuration of business parameters is to set the threshold for key business parameters, complete the early warning function, and implement and submit the early warning information report. It provides parameters for the database connection, and the report upload interface provides configuration function parameters and data source interface parameter configuration functions.

2.3 NoSQL-Based Data Integration

Different data sources for data integration are the data integration process. Data pre-processing is the noise data, heterogeneous data and structured data, unstructured data and semi-structured data and missing data integration. The data integration of college finance in the case of complicated data is completed. The integrated approach is based on NoSQL as a middleware model. The financial data of colleges and universities store a large number of complex structure data. The NoSQL middleware model can accommodate a large number of data distributed storage [8]. Different data sources are collected and exchanged in real time. Finally, the data is stored in the data warehouse, and the form of the data structure should be maintained. Through the analysis of data dimension and data granularity, the ETL server is used to load the original data to the corresponding fact table and dimension table. Figure 4 is a diagram of the architecture of the NoSQL middleware model.

Fig. 4
figure 4

NoSQL middleware model architecture

The main steps of data integration are as follows:

Data sources: data integration is the integration of data sources from different data. First, the structured and unstructured data are cleaned to eliminate the unused value of data, so as to ensure the value, efficiency and reliability of the data utilization.

Data set: cluster analysis of data that reflects the same theme, so that these data can be set up according to the principle of source consistency.

Block storage: data at the end of the life cycle are migrated and archived. It is often stored in the data center of the school, which is conducive to data storage, retrieval and recovery.

One of the significant features of the data is the extensive data source and the complex data structure. Therefore, data integration requires extraction and integration of the required data. From the data source, the relationship and entity between the data is extracted. After data association and data aggregation, a unified data structure is used to store data. First, data integration needs to clean the structured and unstructured data, eliminate worthless data, and ensure the usefulness and reliability of the data [9]. Secondly, the data that reflect the same topic are analyzed by cluster analysis. According to the principle of source consistency, it becomes a collection of information assets that have the value of preservation. Finally, the migration and archiving of data at the end of life cycle is permanently stored in digital archives or school data centers, which is conducive to data storage, retrieval and recovery [10]. The data integration diagram is shown in Fig. 5.

Fig. 5
figure 5

Data integration

2.4 Decision-Making Process Based on Knowledge Discovery

Data mining techniques are used to study large amounts of data based on the identified business goals. It finds regular and useful information about the data and further models the data. Large amounts of data are processed and analyzed. Managers make timely scientific management decision-making plan. Data mining is the process of iterative refinement. It is carried out from data to model, and then to the result. From large databases and data warehouses, data dig unknown data information, valid data information and practical data information. It provides comprehensive and effective help for decision-making [11].

In specific applications, data mining can be divided into five stages: data selection, data pre-processing, data mining, and knowledge evaluation. The general process of data mining is shown in Fig. 6.

Fig. 6
figure 6

The general process of data mining

Data analysis is studied. Service-Oriented Architecture SOA analysis and design methodology is being explored as a research object. On the basis of ensuring the normal operation of the financial management information system of colleges and universities, the data collection, data summary and data analysis of financial data and related data of colleges and universities are made by using information exchange technology and data sharing technology. Service-oriented SOA analysis and design methods were adopted. First, the data is classified. Data cleaning, data exchange and data sharing are extracted into composite service parts. Then business applications are broken down into individual cell components based on business processes, data flows, and business rules [12]. Finally, the second data component is completed. The design of the factual dimension table of the data warehouse is based on the business-driven model. The data remains identical. The data construction of financial management decision-making in colleges and universities provides an expansible, suitable and configurable management system for colleges and universities.

The main steps: First, the business process is familiar. Data flow and business requirements and related role allocation are clearly defined. The data processing service component is decomposed step by step and decomposed into the first business unit. Second, the business nodes are decomposed according to the generated critical path. The business function unit is re formed, and the data model of the business component and the standard is further extended. Finally, the scheduling role of the system architecture is completed to deal with the requirements of data flow and business.

3 Decision Tree Theory

The prediction model algorithm is widely used as the decision tree algorithm. The decision tree algorithm first carries out a large number of data for the purpose of classification. Then, the valuable information between the data is found. It helps the decision-makers select the optimal scheme. Decision tree algorithm is an inductive learning algorithm. It is carried out by example as the basic condition of the study. The advantage of a decision tree is that it has higher classification accuracy and can display important decision attributes.

Decision tree is an important classification technique for data mining. Its main function is based on the attributes of the training data set and the class label. According to the algorithm construction model, the data of the existing training centers are classified and used to classify new data. Genetic algorithms, Bayesian algorithms, decision tree algorithms are all involved in the classification of the algorithm. In the classification technique, the decision tree algorithm is chosen to make the algorithm of decision tree easier to understand, efficient and easy to transform. If–then classification rules are widely used and studied. It plays an important role in data mining. Its main algorithms include ID3, CART and C4.5 [2]. The core of the algorithm is the process of constructing a decision tree model. The data samples of the training set are analyzed. Then the decision tree is constructed, the data is analyzed with the constructed decision tree model, and then the data is predicted and analyzed. Figure 7 is a decision tree model used in a business system. Figure 8 is a flow chart of the decision tree algorithm.

Fig. 7
figure 7

Intent map of buying goods

Fig. 8
figure 8

Decision tree algorithm flow

3.1 ID3 Algorithm

Predictive data mining analysis method is a classification algorithm. This classification algorithm is a model that distinguishes data classes and concepts. Its main goal is to find out the models that can accurately describe them according to the data set. The decision tree method is the most famous algorithm in the classification algorithm. In 1986, a decision tree classification algorithm (ID3) was proposed by Quinlan based on information theory. Information theory is the theoretical basis of ID3 algorithm. Therefore, the algorithm selects the maximum information gain attribute on each node as its test attribute. A sample set is divided into a sample set corresponding to its value. At the same time, the new node grows on the node of the decision tree corresponding to the sample set. The top-down greedy search method is called the ID3 algorithm.

3.2 C4.5 Algorithm

In view of the deficiency of ID3 algorithm, an improved version algorithm of C4 algorithm is proposed. The C4.5 algorithm is based on the decision tree algorithm of information entropy. Its advantage is to replace the information gain of ID3 algorithm with the concept of information gain rate. The basic idea of the decision tree is inherited. Split attribute is to use information gain instead of information entropy as the selection criterion, which avoids the flaw of ID3 algorithm. C4.5 can better handle continuous attributes and missing data, and tree pruning techniques are introduced. At the same time, the accuracy of the classification is improved [6].

The basic idea of decision tree algorithm: Firstly, the training set is regarded as the root node, and the appropriate standard is determined. The split attribute is selected. According to the different values of split attribute, the training data set is divided into several sub data set as the root node of the first level. The sub notes are considered as the root nodes, and the above steps are repeated until the terminating condition is satisfied. The structure is stopped and the decision tree is needed.

Decision tree construction based on C4.5 algorithm: The set of total sample training sets is E. If the training set is empty E. It will return to the single node of F, which is a failure value. The data of the same attribute class C consists of the training set E. The return is a single node with a C class tag. If there is a set A that contains a continuous attribute without a category, the A value is an empty value. The attribute value that will be returned is the highest number of samples in the sample training set E. All the elements in the collection of A are traversed. If the element \(A_{j}\) in the collection A is a continuous attribute. The maximum value in \(A_{j}\) is \(B_{i}\), and the minimum is \(B_{Z}\). Execute a For loop, set the initial value of \(j\) to 2, and add 1 to \(j\) after executing it once until \(j = n - 2\). \(B_{h} = B_{1} + j*(B_{1} + \cdots + B_{n} )/n\). Let \(B\) be equal to the maximum information gain attribute value of \(A_{i}\) element. The maximum value of information gain in set \(A\) is \(X\). Set up a collection \(X\) of the mapping set \(\{ X_{j} \left| {j = 1,2, \ldots ,m\} } \right.\).According to the values in \(\{ X_{j} \left| {j = 1,2, \ldots ,m\} } \right.\), the node nodes \(X_{i}\), \(X_{2}\), … \(X_{m}\) is constructed. The remainder tree was established by repeating the above process.

3.3 Improvement of Decision Tree (MBDT) Algorithm Based on Metric

An improved algorithm for decision tree algorithm is proposed. A decision tree is a metric based decision tree (MBDT) that combines a linear classifier and a decision tree. The method of building decision tree is used to reduce the number of trees. This method is a measure decision tree method. This method can better improve the efficiency of decision tree classification.

By using Mahalanobis distances to classify, the most efficient classification method is chosen for the feature subset of each subset of MBDTs. If the effective classification of some subset cannot be carried out, the selection of the optimal classification method will adopt the method of threshold judgment. The algorithm used is a supervised algorithm, which is a precompleted set of classified training sets with a label.

The MBDT algorithm is a recursive algorithm. The threshold method is used as its branch criterion. Suppose T = {t1}, 1 ≤ ic represents a set of samples, c is the number of the sample classes. Suppose A = {ai}, \(1 \le i \le m\) represents the feature space. \(m\) is the number of attributes, B = {b|b \(\in\) A} is the power space of A. suppose \(\beta_{e}\) was transformed into a misclassification threshold, and \(\beta_{c}\) is a cross misclassification threshold.

For a superset \(C \subseteq T\), \(C\) contains several classes of samples. \(b \in B\) belongs to a set of attributes. The sample is included in this category. The selection of a collection of \(b_{best} \in B\) is the goal, so that most of the C samples can be correctly classified. If the class \(t_{i}\) sample is wrongly classified to the tj ratio, the same class will be ti and tj, and they will wait for the next layer to continue the classification.

In the experiment, the data of each group were divided into two parts: A and B. The data of these two parts are trained and verified, and four classification results are produced. They are both A validation, A training B validation, B training A verification B training B validation. Every result obtained by analysis and improvement of decision tree algorithm is a matrix. The number of samples correctly is represented by elements on the diagonal matrix. Class \(i\) is misclassified as class \(j\), and the matrix is \([i,j](i \ne j)\). Figure 9 shows the classification results validation process for the contrast and consistency of \(t_{1}\) and \(t_{5}\).

Fig. 9
figure 9

Improved algorithm flow chart

4 Improved Decision Tree Based on the Research of Financial Management in Colleges and Universities

4.1 Research on Early Warning Analysis

According to the actual problems, the ID3 algorithm and its improved algorithm are tested. The C5.0 algorithm is put forward to make an early warning analysis on the implementation progress of the financial budget in colleges and universities. According to the financial budget data of previous years, various kinds of financial budgets are classified, and the design of data mining and early warning system is corresponding to them. First, the design of the financial budget early warning analysis system based on the C5.0 algorithm is a new problem with the design of the early warning analysis of the implementation progress of the university financial budget. The research of the subject has great innovation.

Data mining is composed of two parts, which are data warehouse and prediction analysis system. Figure 10 is a design framework for early warning analysis. Among them, the data warehouse includes the basic information of each department of colleges and universities, department name, project name, the opening balance of the project, the opening balance of the temporary payment, the opening balance, the budgetary income, the temporary loan, the payment of temporary repayment, net income, net expenditure, other income, project balances, temporary balances, project balances and other data. The maximum tree growth module, pruning and optimal selection module, even learning module, weight generation module are predictive analysis system module.

Fig. 10
figure 10

The overall architecture of the early warning analysis system

System design includes the following aspects: Database: Data is the source of decision-making knowledge in forecasting systems as well as the basis for forecasting. The data of the financial budget statement needs to be saved in the database, and the data needs to be processed before being stored in the database. The steps are as follows: preliminary classification, speed screening, and finishing. Weight generation module: The weight of each node in the decision tree is formed by weight distribution method and statistical expert judgment data analysis, and its risk weight is introduced into ID3 algorithm.

The data flow chart of the early warning analysis system is shown in Fig. 11.

Fig. 11
figure 11

The data flow chart of the early warning analysis system

Incremental learning module: recursive learning strategies can increase saving analysis of a large number of financial data time. The improved algorithm is a convenient algorithm that can handle increasing data. The combination of both incremental learning and ID3 algorithm is beneficial. A decision tree is generated by ID3 algorithm, and the decision tree is modified by increasing data and incremental learning method.

Data processing and data warehouse: the database that forms the organization mode of the storage data is the data warehouse. From the traditional database, the original data is obtained. Then according to the decision theme, a data layer is formed. It further forms a comprehensive data layer. The technology base of data mining is artificial intelligence. It uses improved decision tree algorithm to mine and classifies the data, and gets new knowledge. According to the new knowledge obtained by the mining analysis, the decision-makers have a scientific and rational analysis of the target event.

4.2 Construction of Financial Budget Data Warehouse

The financial budget data warehouse is designed from three aspects: including data source determination and data preprocessing, data warehouse, three level model establishment and data source composition.

The determination of data sources and the preprocessing of data: After extraction, cleaning, conversion, aggregation, and integration, the source data is migrated from the business operating system to the data warehouse. All the data sources are derived from the financial budget database of the university.

The determination of data sources: The data warehouse is developed and studied according to the theme of management decision analysis. Some data sources are captured from the financial information database of the colleges. The data is centralized to the database server and is loaded into the database to form the data source of the data warehouse.

Data preprocessing: The key step in data integration is the preprocessing of data. The data cleaning is extracted from the noise and unrelated data to the subject. Data conversion includes field, coding, and metric unit conversion, which makes the data form in the data warehouse consistent. The data of the business operation system, source database, data warehouse and data mart are carried out in a certain dimension, and the data should be aggregated to ensure the storage of large amounts of data. The process of data integration is a process of unifying the data from different data sources. The process of data migration is the migration of the data environment, and the migration of data from one environment to another. For example, it has the migration of such a migration process from the file to the source database and the migration of the source data to the data warehouse.

5 Conclusion

The application of data mining algorithm in financial management of commercial schools is studied. The main contents of the structure of university financial management and decision support system are data warehouse. The purpose of running the entire architecture is based on data-driven financial management decisions. A variety of heterogeneous data are stored in a data warehouse. The design of data warehouse for data warehouse is designed by means of analysis requirement, analysis dimension, index and so on. Mining tools are used to analyze the cleaned data. It helps managers to make accurate and quick decision needs and get rid of the blindness of the decision-makers. The management of colleges and universities is increasingly complicated. Financial budget management can strengthen the school financial management and make full use of information resources. The scientific decision-making is the key to solve the problem. Ex-ante budgets, incident control, and ex-post evaluation are effective ways to make scientific decisions.