Study on the Portrait of Online Learners’ Personality and Attitude

Xu, Tao; Zou, Maoyang; Fan, Zhongyue; Chen, Yuxin; Zhang, Yiran; Min, Pan

doi:10.1007/978-3-031-06788-4_35

Tao Xu¹¹,
Maoyang Zou ORCID: orcid.org/0000-0003-0813-9392¹¹,
Zhongyue Fan¹¹,
Yuxin Chen¹¹,
Yiran Zhang¹¹ &
…
Pan Min¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13339))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

1224 Accesses
1 Citations

Abstract

In order to help colleges better understand students’ learning personality and attitude, better guide students to learn and improve the quality of teaching. This paper uses K-Means, MiniBatchKMeans, and Birch to analyze students’ learning personality and attitude. Compared with the three algorithms, we analyze the clustering results of K-means, divide students’ learning personalities into 3 categories: “Active”, “Normal”, and “Dull”, and the attitudes of students are divided into four categories: “Negative and lazy”, “Perfunctory and active”, “Medium-general”, and “Proactive”. The function model is fitted by multiple linear regression to predict students’ scores.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Identify Website Personality by Using Unsupervised Learning Based on Quantitative Website Elements

The Role of Personality in Motivation to use an Affective Feedback System

Article 28 December 2021

Personality and satisfaction with online courses: The relation between the Big Five personality traits and satisfaction with online learning activities

Article 07 July 2022

Keywords

1 Introduction

The development of education has entered a new stage of informatization, and it is transforming from digital education to smart education supported by technologies such as data mining and machine learning [1]. In order to achieve the goal of wisdom education, different management requirements should be adopted for students with different personalities and attitudes, and early warming should be given to students who may fail. To meet these needs, this paper collects MOOC student learning behavior data, makes different researches and comparisons on the personality and attitude of students through different machine learning algorithms. Finally, students’ learning personality is divided into 3 categories, and students’ learning attitude is divided into 4 categories, and through multiple linear regression fitting function model to predict students’ scores.

2 Related Work

In terms of clustering algorithms, S. M. Mostafa [2] et al. used nine clustering algorithms to cluster eight datasets, comparing multiple algorithms by seven performance measures. H Cui [3] et al., who proposed a K-means++− based clustering method for social e-commerce users. It is shown that the proposed method can accurately classify social e-commerce users. There are many research results on the analysis of learner behavior data. such as, José A et al. used learner learning data provided by a regional MOOC provider in Jordan to explore the differences in learners’ behavior and preferences. In the end, it was found that the region attracted younger learners, women, and learners with lower levels of education [4]. Juan Zambrano et al. [5] put forward six measures of student performance in the course based on the data provided by the Massachusetts Institute of Technology in online courses, based on these measurement indicators, the student population was divided into multiple categories, and analyzed the resource usage of middle school students in each class. Zeng Shufang [6] and others analyzed MOOC data to extract learners’ learning behavior characteristics. Finally, she used Ward’s and K-Means clustering to classify learners, which were mainly divided into three categories: “active learners”, “passive learners” and “bystanders”. The results show that active learners have a higher completion rate and achieve better final grades. Tang Mingwei analyzed students’ study behaviors through big data and gave a hidden Markov model. This model establishes the relationship between classroom behavior and student performance, through which the content of the classroom can be adjusted [7]. Zhang Xiaoying applied the K-Means analysis algorithm to classify and analyze the various behaviors of students at school, establishing mathematical model, applied correlation analysis to explain and predict the behavior of college students [8]. Zhang Liyuan [9] and others used data analysis and machine learning methods to research and analyze student behavior, and found that students’ online learning performance is highly correlated with learning behavior. Deng Tianping [10] et al. clustered analysis of student learning data and final exam scores on the MOOC platform. Explore the impact of each learning dimension on the learning effect in different class groups. Tian Chunzi [11] and others used K-Means and DBSCAN to analyze multiple types of data generated by students during school, and compared the two algorithms. In terms of score prediction, M. Zaffard [12] et al. proposed a hybrid feature selection framework to predict student performance. Jin Xiuling [13] optimized the SVM model parameters and established the GA-SVM student performance prediction model. Zhao Xiaoyan [14] based on multi-source data fusion technology, fused various data of college students, including sports, consumption and social behavior, and used support vector machines (SVM) and machine learning (ML) to predict college students’ English scores. Tian Yu [15] and others proposed a novel multi-feature neural network model to predict college entrance examination scores, and verified the effectiveness of the algorithm through simulation experiments. Li Longzhen [16] uses decision tree C4.5 to establish a student’s score prediction model for research, and its prediction accuracy is about 88%. Ren Ge [17] and others used BP neural network to predict students’ score in multiple courses, and the prediction accuracy rate could reach 70%. Yu Tiesuo [18] and others used SVR (Support Vector Regression) to predict performance, and used the prediction results for statistical analysis and early warning.

In this paper, different clustering algorithms are used to analyze the personality and attitude of students’ behavior data. Comparing the quality of different clustering methods, the K-means clustering results are analyzed for students’ learning personality and attitude. Finally, students’ scores and results are obtained by multiple linear regression in this paper.

3 Experimental Process and Result Analysis

This paper mainly focuses on the classification of student’s personality and attitude and the prediction of student performance by a course in the online learning platform. The process of classifying students’ personality and attitude mainly includes:

1.
Data acquisition, data cleaning and preprocessing.
2.
Classification by K-Means, MiniBatchKMeans, Birch algorithm.
3.
Compare the three algorithms and analyze the clustering results of the algorithms.

The predicting process of multiple linear regression is as follows:

1.
Select the attribute that has a relatively obvious linear relationship between student behavior data and student performance as the independent variable.
2.
Fit the relationship function between independent variables and performance through multiple linear regression.
3.
Analyze the function model.

3.1 Data Processing

The data source in this paper is the student behavior data and student basic data of a course on MOOC platform. We extracted the data related to student behavior, including student ID (Id), name (Name), video views (Video), unit detection times (Unit), document reading times (Document), discussion times (Discussion), number of postings (Message), login Number (Login) and final grade (Score). Then we cleaned the data and mainly deleted the students with missing or abnormal field data, and finally left 1685 student data.

3.2 Student Personality Analysis

We use K-Means, MiniBatchKMeans, Birch for cluster analysis, and use the contour coefficient to compare the quality of the algorithm, where the larger the contour coefficient, the better the clustering effect. The contour coefficient is the SC index, which indicates the degree of aggregation within each cluster and the degree of separation between each cluster after clustering. The smaller the distance between samples in the same class, the larger the sample distance between different classes [19] 16, the larger the value of $SC$, the better the clustering effect will be. Therefore, $SC$ is often used as a performance index to evaluate the clustering results. We let ${a}_{i}$ represent the average distance between sample i and other samples in the cluster, and ${b}_{i}$ represents the average separation distance between each cluster. Then we can use the following formula to calculate the contour coefficient ${SC}_{i}$.

$$SC_i = \frac{b_i - a_i }{{\max \left( {a_i ,b_i } \right)}}$$

(1)

First, we select Document and Discussion in the data, and standardize the data. The behavioral data of students selected by us are analyzed for their personality through K-Means, MiniBatchKMeans, and Birch. In addition, the curve changes of the three algorithms’ classification cluster number = 2, 3, 4, 5, 6, 7, 8 and profile coefficient are plotted. In the graph, blue represents K-Means, orange represents Birch, and green represents MiniBatchKMeans (Fig. 1).

The clustering results show that when the K-Means clustering result is optimal, the students are divided into 3 categories at this time, and the contour coefficient at this time is 0.7695. When the MiniBatchKMeans algorithm is optimal, the students are divided into 3 categories, and the contour coefficient is at this time. It is 0.7692. When the result of Birch algorithm is optimal, the students are divided into 3 categories. At this time, the contour coefficient is 0.7668. The results show that when the three algorithms have the best clustering effect, students are divided into three categories. We number the three personalities as 0, 1, and 2, and analyze the three clustering algorithms, as shown in the following Tables 1, 2 and 3:

3.3 Student Attitude Analysis

The realization of student attitude analysis is similar to personality. First we select Video, Unit, Document, Message, Login from the data set as the original clustering data set, and standardize the data, then use the principal component analysis method to transform the data into 2 dimensions. Principal Component Analysis, a method of processing data [20], converts high-dimensional data containing a large amount of redundant information into a small amount of low-dimensional data, and contains the effective information of the original data. Its basic idea is to find a projection transformation matrix that best represents the main personalityistics of the original data under the constraint of the minimum mean square error [21]. Then K-Means, MiniBatchKMeans, and Birch are used to analyze student attitudes based on our selected student behavior data. The curves of cluster number = 2, 3, 4, 5, 6, 7, 8 and contour coefficient are drawn. In the graph, blue represents K-Means, orange represents Birch, and green represents MiniBatchKMeans (Fig. 2).

The clustering results show that when the K-Means clustering result is optimal, the students are divided into 3 categories at this time, and the contour coefficient at this time is 0.7695. When the MiniBatchKMeans algorithm is optimal, the students are divided into 3 categories, and the contour coefficient is at this time. It is 0.7692. When the result of Birch algorithm is optimal, the students are divided into 3 categories. At this time, the contour coefficient is 0.7668. The results show that when the three algorithms have the best clustering effect, students are divided into three categories. We number the three personalities as 0, 1, and 2, and analyze the three clustering algorithms, as shown in the following Tables 4, 5 and 6:

3.4 Student Performance Prediction

Student performance prediction is to use multiple linear regression to fit a function and then predict the performance. Multiple linear regression analysis forecasting method refers to the establishment of a forecasting model through the correlation analysis of two or more independent variables and dependent variables. When there is a linear relationship between the independent variable and the dependent variable, it is called multiple linear regression analysis [22]. One of the significance test methods of the regression equation is to test by Multi-correlation coefficient. When the result of Multi-correlation coefficient is closer to 1, the better the correlation fitting effect will be [23]. The calculation formula of Multi-correlation coefficient is:

$$R = \sqrt {\frac{{\Sigma \left( {\hat{y} - \overline{y}} \right)^2 }}{{\Sigma \left( {y_i - y} \right)^2 }}}$$

(2)

Through our analysis of each attribute and score, we found that the linear relationship between Video, Unit, Document and Score is relatively high. We calculated the Pearson correlation coefficients between them through SPSS software, which were 0.894, 0.935, and 0.937, respectively. So we finally choose Video, Unit, Document as the independent variables, and the linear relationship between these three and the score is as follows (Fig. 3):

In the second step, we use Video, Unit, Document as independent variables and score as dependent variables, and finally use SPSS software to perform multiple linear regression to obtain the function model: y = 0.141 *${\boldsymbol{ }{\varvec{x}}}_{1}$ + 1.335 * ${{\varvec{x}}}_{2}$ + 0.239 * ${{\varvec{x}}}_{3}$− 2.545.

3.5 Result Analysis

For student personality analysis, comparing the K-Means, Birch, and MiniBatchKMeans algorithms, the three algorithms divide the students’ learning personality into three categories when the contour coefficient of the three algorithms is the largest. The three algorithms found the best for personality 0, personality 2 the second, and personality 1 the worst. Among the three algorithms, the Birch algorithm has a lower clustering accuracy than the other two algorithms. We analyze the three personalities through the clustering results of K-means. As shown in Table 7. We can see from the table that the behavioral data values of students in category 1 are relatively small, students in category 2 have the largest value, and students in category 3 are in the middle. We divide students into three categories: “active”, “ordinary”, and “dull”. Among them, the active type is more enthusiastic about things, the normal type is positive and indifferent to things, and the dull type is introverted and indifferent to things.

Table 1. K-Means cluster analysis of student personality.

Full size table

Table 2. Birch cluster analysis of student personality.

Full size table

Table 3. MiniBatchKmeans cluster analysis of student personality

Full size table

Table 4. K-Means cluster analysis of student attitude.

Full size table

Table 5. Birch cluster analysis of student attitude

Full size table

Table 6. MiniBatchKmeans cluster analysis of student attitude

Full size table

Table 7. Student personality clustering results.

Full size table

Table 8. Model summary.

Full size table

Table 9. ANOVA.

Full size table

Table 10. Coefficients

Full size table

For the analysis of student attitudes, when the contour coefficients of the three algorithms are the largest, students’ learning attitudes are divided into 4 categories. The three algorithms found the best for attitude 0 and the worst for attitude 2. For attitude 0, the Kmeans algorithm has the lowest accuracy, and for student learning attitude 1, attitude 2 and attitude 3, MiniBatchKmeans has the lowest accuracy. We draw the result of kmeans algorithm into a radar chart, as shown in Fig. 4. We can draw the following conclusions: the first type of students can be summarized as “negative and laziness” and their attitudes are: negative learning, laziness, and even giving up learning. The second class of students is “perfunctory and active” and their attitudes are: perfunctory, positive comments, and easy going. The third type of students is the “ Medium-general “ whose performance is “medium grades”, “average enthusiasm”, and “sloppy”. The fourth category is “proactive”, which is manifested as “good scores”, “high motivation”, and “active learning”. Among them, there are 993 people for “passive and lazy”, 295 people for “perfunctory and active”, 324 people for “Medium-general”, and 73 people for “proactive”.

According to the function model obtained above, we only need to know that Video, Unit, and Discussion can predict students’ performance. In order to evaluate the quality of the model, we use SPSS to test and evaluate the model. The evaluation results are as follows (Tables 8, 9 and 10) :

Through analysis, we can see that the coefficients corresponding to the number of final video views, detection times, and document reading obtained by using the multiple linear regression are 0.141, 0.1335, 0.239, respectively, the constant term is −2.545, and the significance of each independent variable is less than 0.001. The description shows that the influence of each independent variable on the dependent variable is significant. At the same time, the multi-correlation coefficient R of the model is 0.974 close to 1 and the significance is less than 0.001, indicating that the fit is good.

4 Summary

In his paper, we clear and preprocess students’ behavior data, then use three clustering algorithms to classify students’ learning personality and attitude and compare the results of the three algorithms. Finally, we select the K-Means algorithm to cluster the data and analyze the model. The personality of students is divided into three categories: “active”, “ordinary”, and “boring”. Students’ attitude is divided into four categories: “negative and lazy”, “perfunctory and active”, “medium-general” and “Proactive”. Then the multiple linear regression algorithm is used to predict the student’s performance and test the model. The study of student behavior data can provide targeted suggestions for future teaching practice, and can also provide a theoretical basis for continuous improvement of teachers’ classroom teaching [10].

References

Cheng, P.: Learner personalized modeling and user portrait system for online education platform (2019)
Google Scholar
Mostafa, S.M.: Clustering algorithms: taxonomy, comparison, and empirical analysis in 2d datasets. J. Artif. Intell. 2(4), 511–524 (2020)
Google Scholar
Cui, H.: A k-means++ based user classification method for social e-commerce. Intell. Autom. Soft Comput. 28(1), 277–291 (2021)
Article Google Scholar
Ruipérez-Valiente, J.A., Halawa, S., Slama, R., Reich, J.: Using multi-platform learning analytics to compare regional and global MOOC learning in the Arab world. Comput. Educ. 146, 103776 (2020)
Article Google Scholar
Ruipérez-Valiente, J.A., Halawa, S., Slama, R., Reich, J.: Using multi-platform learning analytics to compare regional and global MOOC learning in the Arab world. Comput. Educ. 146, 103776 (2018)
Article Google Scholar
Tseng, S.F., Tsao, Y.W., Yu, L.C., Chan, C.L., Lai, K.R.: Who will pass? analyzing learner behaviors in MOOCs. Res. Pract. Technol. Enhanc. Learn. 11(1), 1–11 (2016)
Article Google Scholar
Tang, M., et al.: Research on students’ classroom behavior based on big data analysis and hidden Markov model. J. Phys: Conf. Ser. 1873(1), 012084 (2021)
Google Scholar
Zhang, X.: Research on the behavior model of college students based on big data analysis. Electron. Technol. Softw. Eng. 2(2), 1–5 (2021)
Article Google Scholar
Zhang, L.: Research on student online learning behavior based on data analysis. J. Yuzhang Norm. Coll. 2(2), 87–91 (2021)
Google Scholar
Deng, T.: Analysis of student learning behavior based on Mu classroom data. Journal 2(2), 78–82 (2020)
Google Scholar
Tian, C.: Analysis and research of student behavior based on comprehensive data of colleges and universities based on K-Means and DBSCAN clustering algorithm. Journal 2(32), 86–88 (2021)
Google Scholar
Zaffar, M.F.: A hybrid feature selection framework for predicting students performance. Comput. Mater. Continua 1(70), 1893–1920 (2022)
Article Google Scholar
Jin, X.: Prediction of college entrance examination results based on genetic algorithm and support vector machine model. Journal 2(2), 62–65 (2020)
Google Scholar
Zhao, Y.: Prediction of English scores of college students based on multi-source data fusion and social behavior analysis. Journal 2(4) (2020)
Google Scholar
Tian, Y.: College entrance examination score prediction based on multi-feature perception network. Journal 2 (2021)
Google Scholar
Li, L.: Online learning performance prediction based on decision tree algorithm. Journal 2(1), 130–133 (2021)
Google Scholar
Ren, G.: Application of BP neural network in early warning of college student performance. Journal 2(10), 53–55 (2020)
Google Scholar
Yu, T.: Research on the application of SVR regression in performance prediction and early warning. Journal 2(11), 76–80 (2020)
Google Scholar
Ma, X.: Principal component analysis face recognition algorithm based on BP neural network. Journal 2(1), 140–146 (2021)
Google Scholar
Zhao, L.: Analysis and research of student behavior based on comprehensive data of colleges and universities based on K-Means and DBSCAN clustering algorithm. Journal 2(36), 226–229 (2007)
Google Scholar
Ma, J.: Research on big data visualization application based on pca dimensionality reduction. Journal 2(2), 201–206 (2021)
Google Scholar
Chen, K.: Analysis of college English band 4 scores based on multiple linear regression. Journal 2(10), 37–39 (2018)
Google Scholar
Zhang, X.: Application of multiple linear regression in analyzing student performance ranking prediction. Journal 2(5), 154–160 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Chengdu University of Information Technology, Chengdu, 610225, China
Tao Xu, Maoyang Zou, Zhongyue Fan, Yuxin Chen, Yiran Zhang & Pan Min

Authors

Tao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Maoyang Zou
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyue Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yiran Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pan Min
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maoyang Zou .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Xiaorui Zhang
Jinan University, Guangzhou, China
Zhihua Xia
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, T., Zou, M., Fan, Z., Chen, Y., Zhang, Y., Min, P. (2022). Study on the Portrait of Online Learners’ Personality and Attitude. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13339. Springer, Cham. https://doi.org/10.1007/978-3-031-06788-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-06788-4_35
Published: 04 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06787-7
Online ISBN: 978-3-031-06788-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Study on the Portrait of Online Learners’ Personality and Attitude

Abstract

Similar content being viewed by others

Identify Website Personality by Using Unsupervised Learning Based on Quantitative Website Elements

The Role of Personality in Motivation to use an Affective Feedback System

Personality and satisfaction with online courses: The relation between the Big Five personality traits and satisfaction with online learning activities

Keywords

1 Introduction

2 Related Work