Keywords

1 Introduction

Internet addiction disorder refers to excessive internet use that interferes with daily life [1]. Some research shows that the addiction towards the internet has a negative impact on college students, such as the backwardness of study [8], health [1], social relationship [11] and so on. Therefore, it’s necessary to discover students’ addiction tendencies towards internet and make correct guidance for them.

At present, related works of internet addiction are concentrated on psychological fields. Such works focus on the causes, the influence of internet addiction and etc. There are few works on calculating internet addiction level quantitatively. Besides, the methods used for analyzing are mainly questionnaires and statistical analysis which is cumbersome and relays on the domain experts heavily. Therefore it’s necessary to develop a method to explore students’ internet addiction level quantitatively and automatically.

Fortunately, with the development of the smart campus, students’ behavior data are collected, such as the access data, consuming data and so on. With this data, It’s possible to analyze students’ internet addiction level quantitatively.

In this paper, We propose an approach to estimate students’ internet addiction level using their behavior data. Currently, there is not a method to evaluate students’ addiction level precisely. Therefore, we can calculate students’ internet addiction level through another task. In detail, we consider that the student’ s internet addiction level is a hidden variable which will affect students’ daily time online. Besides, student’s behavior data such as consuming data, the internet access gap, etc. reflect student’s daily activities which may also influence the time they spend online. By predicting students’ daily time online with their behavior and internet addiction level, we will get students’ internet addiction level. Along this line, we propose a linear internet addiction (LIA) model and neural network internet addiction (NIA) model to capture the relationship between students’ behavior data, internet addiction and the time they spend online every data. Furthermore, both of the models take the regularity of students’ behavior into consideration. Finally, we conduct extensive experiments on a real-world dataset from a Chinese college, and the experimental results demonstrate the correctness and effectiveness of the model we propose. And the results are also consistent with some psychological findings.

2 Related Works

The main related work of this paper can be divided into two parts: internet addiction analysis and campus data mining.

Internet Addiction Analysis. Internet addiction analysis is a research direction in the psychological field. Some works are focusing on the causes of internet addiction. Researchers found that interpersonal difficulties, psychological factor, social skills, etc. are all reasons of internet addiction [1, 3, 7]. Other works aim at finding the influence of internet addiction. Upadhayay et al. [8] claimed that excessive use of the internet would lead the drawback of study. He et al. [5] explored internet addiction’s influence on the sensitivity towards punishment and award.There are also some works about the inner mechanism of forming internet addiction. Zhang et al. [12] focused on the inner reason of family function’s negative influence on internet addiction. Zhao et al. [13] notice that stressful life event make users feel depressed, which causes the user addicted to the internet.

Campus Data Mining. Campus data mining refers to solving problems in campus with data mining method. Zhu et al. [14] propose an unsupervised method to calculate students’ procrastination value with their borrow info in the library. Guan et al. [4] predict students’ financial hardship through their smart card usage, internet usage and students’ trajectories on campus(Dis-HARD model).There is also some work aiming at analyzing students’ studying process and improving their performance in class. For example, Burlak et al. [2] identify if a student is cheating in an exam by analyzing their interactive data with online course systems such as start time, end time, IP address, access frequency, etc. Above all, to the best of our knowledge. There is no work on analyzing internet addiction using students’ daily behavior. And we are the first to analyze internet addiction based on their behavior data with data mining method.

3 Preliminaries

Internet Addiction is an abstract concept in psychological field, so it’s hard to give a measurable definition to internet addiction. To solve this problem, we first make a reasonable assumption about internet addiction. Then, based on this assumption, we calculate internet addiction value using students’ behavior data.

3.1 Internet Addiction Assumption

Psychological research [6] shows that most college students are addicted to internet. And we mentioned that internet addiction refers to excessive use of internet interfering with daily life. Therefore students with different internet addiction level are very likely to spend different time online. Besides, different behaviors show the different activities in school, which in turn also lead to different online time. And students of different gender or department will also have some differences in internet use.

Based on such fact, we assume that internet addiction is a hidden factor which may influence students’ daily time online together with their behavior and profile information. Therefore we will learn such factor by modeling how students’ internet addiction and behaviors influence daily online time. To simplify the problem, we also assume students’ internet addiction level will not change in a semester.

3.2 Problem Formulation

Since we don’t have any label about internet addiction level, we can’t use supervised method to study students’ internet addiction value. Thus we need to estimate it through some known data. Based on our assumption that internet addiction value is a hidden variable which may affect the time students spend online, the value can be learned by predicting students’ daily online time.

Formally, we define \(a_u \) as the internet addiction level of student u. Daily time online sequence of student u during a period T is represented as \(\{T_u(t)\}\). And the daily behavior sequence of u during the same period is represented as \(\{B_u (t)\}\) We also define the personal profile information of student u as \(\{p_u \}\). Our task is to model the relationship \(\{a_u , p_u, B_u (t)\}->\{T_u (t)\}\) which is how students’ behaviors and internet addiction influence their daily time online. Then the internet addiction level \(a_u\) can be calculated from this model. Note that the t above is in the set T.

4 Linear Internet Addiction (LIA) Model

In this chapter, we first introduce how we use linear model to reveal the relationship of \(\{a_u , p_u, B_u (t)\}->\{T_u (t)\}\). Then to strengthen the model, we take the regularity of students’ behaviors into consideration.

4.1 Naive LIA

Based on the internet addiction assumption, behavior is a factor which will influence students’ online time. The impact of behavior on online time is not different in individuals, so every student shares this weight vector. We deal with the different kind of personal attributes the same way. And we suppose that different internet addiction level is the only reason which causes different time online with the same behavior and personal attributes. Here comes our naive linear internet addiction model.

$$\begin{aligned} y_u(t)=w x_u(t)+a_u \end{aligned}$$

\(y_u(t)\) represents the duration student u spend online at time t. \(x_u(t)\) refers to the combination of behavior vector and personal attributes of student u at time t, and w is the weight vector of that combined vector. \(a_u\) here is the internet addiction level of student u. Our task is to find the value of \(a_u\) and w that minimize the loss function, that is:

$$\begin{aligned} argmin _{w,a_u}\sum _{u\in U}\sum _{t\in T}(y_u(t)-w^T x_u(t)-a_u)^2 +\lambda ||w||^2 +\mu \sum _{u\in U}a_u^2 \end{aligned}$$

The item \(\lambda ||w||^2\) is used to prevent the model from overfitting. \(\mu \sum _{u\in U}a_u^2\) can adjust the weight between behavior and internet addiction.

4.2 LIA with Regular Behavior

College student usually have a fixed curriculum, therefore, their behavior has some regularity every week, which will also lead to the regularity of the time they spend online. Take student u as an example, courses on Monday are kind of boring, so he spends a lot of time surfing the internet. However, courses on Tuesday are hard which means he must pay attention to the class, so he may not surf the internet in class. Based on such a fact, it’s necessary to take the regular online time into consideration.

So we modify our linear internet addiction model by add an item \(d_u(\pi (t))\) to represent the regular online time of student u at time t. Due to the character of college study, they perform similar online habit every week. So here \(\pi (t)\) means which day of time t is of the week it belongs to, and \(d_u(x)\) means the regular online time of the x-th day of the week. Here comes our new model :

$$\begin{aligned} y_u(t)=wx_u(t)+a_u+d_u(\pi (t)) \end{aligned}$$

For the convenience of calculation, we define \(x_{2u}(t)\) as 8 dimension vector with the first item one standing for the internet addiction, others being a one-hot representation of the week. The formula above is equal to

$$\begin{aligned} y_u(t)=w^T x_u(t)+w_u^T x_{2u}(t) \end{aligned}$$

with \(x_{2u}\) being equal to :

$$\begin{aligned} (1,\pi _1(t),\pi _2(t),\pi _3(t),\pi _4(t),\pi _5(t),\pi _6(t),\pi _7(t)) \end{aligned}$$
(1)
$$ \pi _i(t)={\left\{ \begin{array}{ll} 1, &{} \text {if } \pi (t)=i ;\\ 0, &{} \text {otherwise} \end{array}\right. } $$

Our task is to find a suitable w and \(w_u\) that will minimize the loss function, the first item of \(w_u\) is the internet addiction level of student u:

$$\begin{aligned} argmin_{w,w_u}\sum _{u\in U}\sum _{t\in T}(y_u(t)-w^T x_u(t)-w_u^T x_{2u}(t))^2+\lambda ||w||^2+\mu \sum _{u\in U}||w_u||^2 \end{aligned}$$

Similarly, we add \(\lambda ||w||^2\) to prevent the formula from overfitting, and the formula \(\mu ||w_u||^2\) to adjust the weights between behavior, personal attributes and internet addiction level, regular habit.

5 Neural Network Internet Addiction (NIA) Model

In this section, we develop a neural network internet addiction (NIA) model to represent the non-linear influence of students’ behaviors on their online time.

5.1 Network Structure

The neural network consists of two part: the public part and the private part. We use the public part to represent that the effect of behavior and personal attributes on daily online time is not different in individuals, and the weight matrix and threshold vector of this part will update every iteration. Because the internet addiction level and regular behavior are different in individuals, we use a private part to depict such characteristic. Every student has its own weight matrix and threshold vector, and the parameter will only be updated when the corresponding student’s data is used as input. the private input of student u at time t is the same as vector (1). To ignore the influence of regular behavior, we can also keep only the first item of vector (1).

The structure of the network is shown as Fig. 1

Fig. 1.
figure 1

Neural network internet addiction model.

5.2 Internet Addiction Calculation

After the neural network training is completed, the sum of the contribution that internet addiction gives to the private hidden units is the value of students’ internet addiction level. We will calculate the internet addiction value as below:

$$\begin{aligned} a_u=\sum _{j=1}^{q_u}V_{uij} \end{aligned}$$

\(q_u\) stands for the number of private hidden layer unit. i is the corresponding index of internet addiction in the private part input vector, and here the index is one. \(V_u\) is the matrix which connects the input layer and hidden layer of the private part. \(V_{uij}\) represents the i-th row and the j-th column value of matrix \(V_u\).

6 Experiments

6.1 Data Description

Our data comes from a Chinese college, including students’ consuming records in the school restaurant and internet access records. Besides it also includes the personal attributes information of students such as department, gender and age.

The consumption records consists of students’ id, the time, place and amount of one consumption. Students have various consumption behavior, therefore we first divide the places into different categories, and then extract the consuming amount on dining, snack, shower, deposit and total consuming amount per hour from the consuming records. We also count students’ daily consumption frequency.

Table 1. Features used in experiments

In addition, Because students can access the internet using campus wifi only when they get authenticated, based on the authentication record, we extract the time student accesses the campus wifi per hour. Similarly, Each time a student visits a website, a connection record is generated. When the visit is completed, there will be a disconnected record. Based on these records, we can extract the student’s actual online time and the average gap between two internet access per day. We also represent every student with one-hot method using their profile information.

Due to some reason, we don’t have students’ internet access records in dormitory and library. Considering that students’ activities are mainly centralized around classrooms and canteens as well as some college student activity centers. In class, students need to listen to the teachers at most time, and at the restaurant, they always play with a phone to kill time. Therefore, the actual online time we extract is mainly about entertainment. Intuitively, the entertainment time is suitable to be used to calculate the internet addiction level.

We choose the records of undergraduate students enrolled in 2016 from September 1st, 2018 to November 11th, 2018. After dropping students with records less than 35 days, there are 2341 students. The first 50 records of each student are used for training and the left records are used for testing. Students’ profile representation and daily behavior vector is shown in Table 1.

6.2 Internet Addiction Calculation

LIA and NIA model can study the internet addiction level by predicting students’ online time every day. To show the correctness of our model, we conduct several experiments.

For each model, we conduct three experiments. The first experiment removes the internet addiction and regular behavior part of LIA and NIA model, and predicts students’ daily online time using students’ behavior data and profile information, which is considered as a baseline. The second experiment only takes the internet addiction into consideration. The last experiment takes internet addiction and regular behavior into consideration.

For the linear model, the value of \(\lambda \) is set to 0.6, and \(\mu \) is set to 0.4. For neural network model, the activation function of hidden layer is \(f(x)=x\), and the activation function of output layer is \(f(x)=tanh(x)\). In addition, the number of public hidden layer units is 10, and the number of private hidden layer units is 2. The learning rate is set to 0.01, and the number of the epoch is 40. The MSE performance of each method is shown as Table 2. Note that ‘ia−’ refers to the baseline experiments, ‘ia’ represents the second experiment, and ‘ia+’ stands for the third experiment.

Table 2. Regression results

From the results in Table 2, we know that no matter of linear model or neural network model, the prediction accuracy increases with our internet addiction assumption. Such results guarantee the correctness of our internet addiction assumption. However, adding the assumption of regular behavior, the accuracy doesn’t improve compared to the results without such an assumption. One possible reason is that there is some volatility in students’ behavior, however. LIA and NIA are not able to model it. Generally, the results of the neural network model are worse than the linear model. Maybe it’s because the linear model is strong enough to represent the relationship between students’ behavior, internet addiction and online time.

6.3 Internet Addiction Verification

In this section, we devise regression and classification tasks to verify the correctness of the internet addiction value learned from the model we proposed.

Based on our assumption, internet addiction is a hidden variable which will influence students’ daily time online. Therefore the learned internet addiction value should be a useful feature to predict students’ online time. We devise two tasks to verify the truth of our learned internet addiction value.

The aim of the regression task is to predict students’ daily online time. The baseline experiment takes the daily behavior vector and profile information as the input. The contrast experiment predicts the daily online time using students’ internet addiction value, daily behavior vector and profile information. For the classification task, it is similar to the regression task. The aim of the classification task is to predict which online time interval it belongs to. The experiment settings are the same as the regression task. The method used in the regression task and classification task consists of Decision tree (DT), support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), Adaboost, gradient boosting decision tree (GBDT) [10], bagging and extremely randomized trees (ET). MSE is used as the evaluation method for the regression task, and f1-score for the classification task.

Table 3. Regression task
Table 4. Classification task

Note that ‘ia−’ refers to the baseline experiment, ‘ia(LIA)’ stands for the experiment with the internet addiction value learned by naive LIA, and ‘ia(NIA)’ represents the experiment with the best internet addiction value learned by NIA without regular behavior consideration.

From Table 3, we observe that for the regression task, the SVM model get a huge mean square error. One possible may be that it is not suitable for this task, so we will ignore the SVM results in the discussion below. After adding internet addiction value calculated by LIA, all of the prediction accuracy lifts. And after adding internet addiction value calculated by NIA, some methods still get promotion. For the classification task, no matter which internet addiction value is added to the behavior vector, except for the effect of the SVM and Adaboost methods has not changed, the effect of all other methods has been evidently improved (Table 4).

Generally speaking, after adding the internet addiction value calculated by LIA or NIA, both of regression and classification task get a remarkable promotion, which shows the effectiveness of the internet addiction value learned by the model we propose.

6.4 Internet Addiction Analysis

To show the internet addiction situation in college, we analyze the distribution of internet addiction. Because the naive LIA model has the best prediction accuracy when studying students’ internet addiction value, the following analysis is based on the value calculated by naive LIA.

Fig. 2.
figure 2

Internet addiction distribution. (a): Number of students with different levels of dependency. (b): Some students with value greater than 0.6 or less than 0.2 are deleted.

Internet Addiction Distribution. Figure 2(a) illustrates the number of students with respect to the calculated internet addiction value. The greater internet addiction value is, the more serious students’ addiction towards the internet is. The internet addiction distribution is similar to a normal distribution. To show the distribution of internet addiction value clearly, we delete the value greater than 0.6 or less than 0.2, which comes Fig. 2(b). If we define internet addiction less than 0.4 is normal, from Fig. 2(b), we observe that most of the students are addicted to the internet with different levels.

Internet Addiction Differences Among Groups. To reveal the differences of internet addiction between genders, we count the average internet addiction value of different genders. And we also count the average online time of different genders, Fig. 3 shows that girls spend more time on the internet than boys.

However, boys are more addicted to the internet than girls. Such a result is consistent with a finding in the psychological field. Wei et al. investigate the internet addiction situation of the college student in Hubei Polytechnic University using questionnaires. They point out that boys are usually not good at communication, and the way of communication with the network as the medium is easier to control, that is, they can improve the quality and quantity of communication in this way, which meets their needs of communication. Besides, Girls are better than boys in time management ability and deal with network use time more reasonably. So boys are more addicted to the internet than girls [9]. The consistency with the findings of psychology further proves the correctness of the internet addiction value we learned.

Fig. 3.
figure 3

Differences of online time and internet addiction between different genders.

Fig. 4.
figure 4

Differences of internet addiction among different departments and disciplines

Figure 4(a) illustrate the average internet addiction level of different department. In general, except the internet addiction level of a few departments is extremely high, it fluctuates around 0.4. Further, we statistically analyze the differences in internet addiction level among students in different disciplines. In Fig. 4(b), we can observe that there is no significant difference in internet addiction level among students in different disciplines. The result is also consistent with the psychological finding in [9]. Experimental conducted by Wei et al. demonstrate the difference in internet addiction is not significant. The consistent result with psychological findings is also evidence of the effectiveness of the internet addiction value we learned.

Effect of Internet Addiction on Online Time. To show the role internet addiction plays when predicting students’ online time, we extract students’ daily behavior, and then we conduct two two-classification experiments using decision tree method: one predicts online time interval with daily behavior, and the other predicts online time interval with daily behavior and internet addiction value. Since the whole tree is too big to be put here, we select two representative branches. Note that all the values are normalized.

Fig. 5.
figure 5

Decision tree with behavior and internet addiction value. (a) predict with behavior data, (b) predict with behavior data and internet addiction value learned through naive LIA

From Fig. 5(a), we know that wifi access time and average internet access gap are important features when predicting online time. It is consistent with our intuitive thinking that less wifi access time and long internet access gap will cause less online time. Figure 5(b) illustrates that the internet addiction value is critical for predicting daily online time. Particularly, in this branch, the relatively low internet addiction value is a reason leading to short online time.

7 Conclusion

In this paper, we estimate college students’ internet addiction level quantitatively using their behavior data in the campus. Specifically, we define the internet addiction value as a hidden variable which will affect students’ online time, and formulate the problem as a regression problem. Along this line, we propose a linear internet addiction (LIA) model and a neural network internet addiction (NIA) model. Both of the models take students’ regular behavior into consideration. Finally, we conduct excessive experiments on a real-world dataset from a Chinese college, the results demonstrate the effectiveness of our model, and the analysis results are consistent with some psychological findings, which also verify the correctness of the model we propose.