Keywords

1 Introduction

Academic analytics is one branch of modern day’s data analysis which uses statistical analysis and data mining methods to reveal and recognize hidden patterns in vast educational databases [1,2,3,4,5,6]. Such patterns enable us to throw better light on educational aspects related to student behavior, prognostication, student-centric learning, remedial aspects, and learning outcome with high accuracy. This will definitely increase standards of Indian higher educational system [6]. Due to digitization and effective use of computers, IT and ICT technologies, all educational organizations, institutions, and universities have generated and stored large data in their databases [713]. This data can be a key source for futuristic decision making processes if it is being processed through academic analytics. We took it as a challenge to see all the business intelligence, patterns, correlations, and rules embedded in this data. Our work is an interdisciplinary work undertaken by three schools of our university as performance analysis shares sphere with educational pedagogies, statistics, and computer-enabled technologies. The academic analytics was implemented using SPSS software [14, 15].

A closed questionnaire with predefined answers was used for data gathering [16] on A4 size single-sided paper sheet. Performance-related economical, social, and emotional attributes of this questionnaire were selected with the help of School of Educational Sciences and as per theory of Pritchard and Wilson [16, 17]. The questionnaire was modified number of times to reduce the complexity of understanding as well as to increase simplicity of answering. It was tested on subset of students after every revision. An Excel sheet was prepared for the answers using code such as 0, 1, 2. The confidential issues in datasets were properly addressed as dataset carried personal information of students. The error rate during preprocessing was 38% which finally reduced to 5% after proper convincing to students. The questionnaire looks like Figs. 1 and 2.

Fig. 1
figure 1

Sample questionnaire

Fig. 2
figure 2

Data set in MS-Excel

2 Experimentations and Discussion

Our aim was to discover invisible attributes related to performance of students. So we had discussions with educationalist and then finally understood that the semester end marks alone cannot be taken as main indicator of student’s performance. The performance is indistinct term. For proper knowledge, surveyed literatures such as Shoukat Ali et al. [4], Graetz [18], Considine and Zappala [19], and Bratti and Staffolani [20]. This analysis is helpful for identifying the personal, social and economic kinds attributes in our study.

With these preliminary investigations and understanding, we decided to identify key variables that accelerates or downgrades educational performance at large. We had thought that economical and social conditions of students can be important variables from our dataset/questionnaire as far as performance was concerned. To do so, many variables and their interrelations needed to be analyzed for proper analysis. It is always true for questionnaires as they consist of many questions, such that each question contributes for one variable [7, 2123]. Studying all variables and their interrelation may be complicated as they may divert us from the original research focus. For such exploratory analysis, factor analysis has been invented [22]. We have used SPSS22.0v to analyze the data set. The snapshots given below show the evidence of empirical analysis. The descriptive statistics are used through MS-Excel to represent our data in the diagrammatic form. Some of the interesting facts are shown in Figs. 3, 4, 5 and 6. Further, canonical correlation analysis and chi-square testing have been done on the experimental data set.

Fig. 3
figure 3

Region-wise distribution

Fig. 4
figure 4

Diversity in jobs among parents

Fig. 5
figure 5

Parents versus their education level

Fig. 6
figure 6

Students versus their family size

2.1 Program Code

The SPSS22.0v is used to analyze the data set [16].

  • FREQUENCIES VARIABLES=GENDER MARRIED AGE REGION UG FEDU FJOB

  • FINCOM MEDU MJOB MINCOM FSIZE FRELATIONS FSUPPORT REASON TMODE

  • TTIME STIME FAILURES TUTORIAL SCHOLERSHIP PJOB MM HARDSUB_UG

  • STUDY_HOME SELFLIB SELFPC PLACELVING INTERNET F_T_STUDY

  • F_T_FRIEND MOVIEPWEEK CAREERDREM PARALLELCOURSE OWN_NOTES

  • FREE_T_ACC PER_SATISF MATERIAL HLT_STATUS

  • /ORDER=ANALYSIS.

  • CROSSTABS

  • /TABLES=REGION GENDER BY FAILURES STIME SCHOLERSHIP PJOB

  • SELFLIB SELFPC PLACELVING INTERNET F_T_STUDY F_T_FRIEND

  • MOVIEPWEEK OWN_NOTES PER_SATISF MATERIAL

  • /FORMAT=AVALUE TABLES

  • /STATISTICS=CHISQ

  • /CELLS=COUNT

  • /COUNT ROUND CELL.

The use of descriptive statistics has been made using MS-Excel to represent our data in the diagrammatic form. Figures 3, 4, 5 and 6 show the distribution of the data according to region-wise classification, diversity of parents jobs, education-wise, and their family size-wise, respectively. The students came from urban and rural backgrounds are found to be approximately same of Indian students as compared to foreign students. The discrimination in the student’s performance is observed according to their parent’s job and educational background. Also, numbers of family members in student’s family were represented through the bar plot. The interesting facts are shown in Figs. 3, 4, 5 and 6.

2.2 Canonical Correlation Analysis

The core objective is to find relationship between personal details with family background. We made two groups for proper analysis. The first group is student’s details containing three parameters, viz. gender, age, and UG percentage. The second group is his/her family background and the parameters chosen are: father’s education, father’s job, father’s income, mother’s education, mother’s job, mother’s income, family size, and whether student does any part-time job? Here, Canonical correlation analysis is used to find the significant relationship between student’s details and his family background to determine the associations among two sets of variables. Our observations gave us significant outcomes.

2.3 Chi-Square Analysis

Sample analysis using chi-square tests is mentioned here. Similar way, the results were computed and it has been represented in the form of conclusion. Below figures and tables show the use of descriptive statistics. These together show some data regarding diversity of the students according to course-wise, gender-wise, undergraduate background, father’s occupation, and their family size. We have applied chi-square test to test the significance among the above objectives and assumptions that there will be significant difference among the variables under study.

Some of the parameters which show significant differences in our study are as scholarship holder students with gender-wise; difference gender-wise about their career dreams; between gender-wise percentages obtained at UG level, between region-wise percentages obtained at UG level by the students; between age group-wise obtained scholarships; between age group-wise obtained UG Percentage; students and their father’s education; students and their father’s job; between gender-wise and their mother’s education; age-wise and their family size; age-wise and part-time job; region-wise and father’s education; region-wise and family size; students place of living and self library. Further, we have made analysis using chi-square Tests with the help of SPSS 22.0 software [15] and found some significant results. These are represented in the form of tables. According to region-wise study with respect to variables like place of living, do they have their own PC? Do they use internet? How much free time they have for study? It was surprising to note that there are significant differences with respect to student’s living places. These differences came because of student’s awareness to use internet. Our students are from computer science field, and hence, it is expected that they must frequently use internet. From our experimental analysis, it is found to be true. While dealing with students free time for study perspective, it has been observed that there is significant enough good time is available for study. It was assumed that in due course of studentship, he/she may get sufficient time for study rather than doing any other work. This particularly holds true as the Nanded region is not a metro city or an industrial hub. When we did gender-wise study with a variable, how much scholarship they get? It is observed that there is significance difference. Male students get more scholarship than female students. Also, we found significance among gender-wise difference in their place of living. Most of the female students preferred to live at own home or in hostels due to safety issues. Tables 1234 and 5 show these results.

Table 1 Chi-square tests analysis for region versus students having self PC
Table 2 Chi-square tests analysis for region versus place of living
Table 3 Chi-square tests analysis for region versus free time to study
Table 4 Chi-square tests analysis for region versus students having self PC
Table 5 Chi-square tests analysis for gender versus place of living

3 Conclusion

The performance of the student is fuzzy terms and it is affected by many parameters. In this study, our data reveal that it is due to the social and economical condition of students. However, no scientific evidences were there for such outcome. The study took it as challenge and it has been discovered that the student’s performance mere did not depend on his/her studious nature. This paper shows effective use of academic analytics in terms of descriptive statistics. Here, we have applied canonical correlation analysis and chi-square test to test the significance among the stated objectives and assumptions. We have finally discovered new variables, which otherwise were invisible that hampers performance of students.