Keywords

1 Introduction

Indian higher education system is largely based on University Grants Commission (UGC) guidelines for universities, which have a large number of affiliated colleges across the country. UGC also act as a regulator for general higher education in the country [1] . Similarly All India Council for Technical Education (AICTE) guides large number of engineering colleges, polytechnics, technical universities, and their affiliated colleges and act as a regulator for engineering and other professional education in the country. The certificate level skill/vocational courses are largely run by Industrial Training Institutes (ITI) under the guidelines of Director General of Employment and Training (DGET) and are regulated by the National Council for Vocational Training (NCVT). The directorates of technical educations (DTEs) are state government organizations that are managing the technical education system at state levels under the guidelines of the above referred federal statutory bodies like UGC, AICTE, DGET, and NCVT. There are tens of thousands of colleges affiliated to universities and state boards of technical education, and millions of students are searching for courses of their choice and financial assistance to fund their education. There are many federal and state-funded financial assistance schemes available for students of different economic status in the society like for weaker sections of the society-scheduled castes (SC) and other backward classes (OBC), minority communities, and merit cum means scholarships are available. The universities, colleges, polytechnics, and industrial training institutes look for solution to handle such large amount of student data ranging from enrollments, scholarships, academic content, literature search, testing, results, and finally placements and also for analyzing data for future decision-making. Such educational institutions are generating huge volumes of data, from grades or test scores to enrolment numbers, scholarships, and placement tracking. With the advent of online courses offered by many universities, the amount of data available to educational officials and students has exploded [2]. Various database management frameworks and analytical software help in identifying relevant pedagogic approaches. The new frameworks are needed in today’s world to support data mining approaches for increasing the efficacy of educational institutions. For providing financial assistance to unrepresented students, a new framework for computerization of Punjab Technical Education has been suggested which is more secure, cost-effective, reliable, and easy to use and can handle ever-increasing memory space requirements. Modern-day open-source tools like MongoDB coupled with cloud technology are used, thereby making comparisons with other conventional technologies. The results are interesting as summarized in the later sections. Some characteristics of NoSQL databases are inherently schema-less and highly scalable. Also due to advances in the information and communication technology and faster Internet facilities, it is much easier for institutions like schools, colleges, and universities to approach out to more and more students and to attract them for admissions, academics, scholarships, and other similar and related activities. Such data generated in technical and engineering education system may be further classified such as data related to the financial assistance/scholarships, admissions, academics, and evaluation. Universities and other educational institutions are working overnight to identify relevant talent pools and new courses with a view to appeal to the students based upon such data analysis. Scholarships and financial assistance options are available for weaker sections of the society and other minority communities through federal and state funding. As per the 2011 census of India, for a total of 1.2 billion of population, the scheduled castes and tribes (SC/ST), backward classes, and minority community were having major numbers. As shown in pie chart in Fig. 1, the scheduled caste population was about 19.59% and scheduled tribes (ST) was 8.63% of the total population in India. Other backward classes (OBC) population was 40.94%. Similarly population of minority community in India was about 20.5% which is a considerable chunk as shown in Fig. 2. Most of the scholarships and financial assistance schemes were designed for uplifting such weaker and unrepresented sections of the society and also uplifting the minority communities with low incomes. Millions are being spent for funding education of such students year after year in the past many years throughout the country with spending ratio of federal-states 90:10. The bar chart in Fig. 3 shows students applying for financial assistance/scholarship schemes only for the state of Punjab having about 1800 affiliated colleges and polytechnics and Industrial Training Institutes. Numbers are much higher at country level. Annual spending on such social justice schemes is more than INR 60000000 in the state of Punjab alone. India consists of about 30 such states, and similar social welfare schemes are running in all the states, covering large number of students. When such a large no. of students apply for scholarships and financial assistance, it is very difficult to eliminate the students with duplicate data, fake data, suspicious and unreliable data arriving from so many sources year after year, and every new intake. Global education systems may already be using advanced tools and using such advanced practices for real business intelligence, financial analytics, and predictive analytics and finally making the strategic management to remain effective. The data sources may include students’ personal information, their results, certificates, past educational qualifications and institutions, and parental income and dropout rates, including some sensitive information like their social security number/UID (unique identification) number, bank account number, etc.

Fig. 1
figure 1

Caste-based population, 2011

Fig. 2
figure 2

Community-based population, 2011

Fig. 3
figure 3

Students applying for financial assistance year wise in the State of Punjab

2 Financial Assistance Management

2.1 Risk Detection

Data security and information integrity is a big challenge in institutional data as the personal data and information of applicants can be stolen online. For example, if national identity number (UID) of the student or bank accounts are stolen by hackers, it can lead to financial loss to the applicant students. Leakage of such personal and classified data can lead to various scams. So risk detection and analysis and using various security techniques like modern encryption algorithms are proposed to be inbuilt in the data mining system [3].

2.2 Performance Prediction

The performance prediction of students whether he/she is continuing in his studies after availing the benefit of financial assistance need to be ascertained before granting the scholarship application for the next semester/year. His/her board/university scores need to be linked using various data tools to the database management system. If he/she does not appear or pass any of the subjects, his/her application is liable to be rejected till he/she passes the requisite number of subjects and reapply for scholarship of next semester/year. In the proposed study data, alert has been implemented. Dropout rates can be ascertained while analyzing the data, so finally the decision-making can be improved for further award of scholarships.

2.3 Data Visualization

Technical educational data become more and more complex as it grow in size. Data can be visualized using data visualization techniques to easily identify the trends and relations in the data just by looking on the visual reports.

2.4 Intelligent Feedback

Learning systems can provide intelligent and immediate feedback to students in response to their inputs which will improve student interaction and performance. It is proposed to implement a new framework that can be developed by linking application submission transaction for scholarship applications till the approval happens.

2.5 Conventional Database Framework

There are tens of thousands of students applying for different financial assistance and scholarships every intake. There are number of options and schemes available on the basis of caste, social status of families, merit cum means, or uplifting of minority communities. The step-by-step procedure is shown in Fig. 4 which is currently in place.

Fig. 4
figure 4

Flow chart of existing framework

2.6 Implementation of New Framework

The following Ford-Fulkerson algorithm for new framework describes it as:

input: Applications form from students output: send checks f to banks for awards for each application (u, v) in Database do implement clustering approach to distribute the applications while there exists appropriate application to scholarship. return f

It is based on the following example:

Here follows a longer example of mathematical-style pseudo-code, for the Ford-Fulkerson algorithm:

Algorithm Ford-Fulkerson is input: Graph G with flow capacity c, source node s, sink node t output: Flow f such that f is maximal from s to t (Note that f (u,v) is the flow from node u to node v, and c (u,v) is the flow capacity from node u to node v) for each edge (u, v) in G E do f (u, v) ← 0 f (v, u) ← 0 while there exists a path p from s to t in the residual network G f do let cf be the flow capacity of the residual network G f cf(p) ← min{cf(u, v) | (u, v) in p} for each edge (u, v) in p do f (u, v)f (u, v) + cf(p) f (v, u) ← −f (u, v)

2.7 Description of Framework

The new framework is cloud-based platform using virtualized cluster of servers over data centers over SLA [4]. Dynamic resource provisioning of the servers storage and the networks is cloud computing basically. The student fills in the application details from his/her mobile phone/laptop. The UID server authenticates his/her identity from his/her UID (unique identification) number and opens up the application form. The student fills it up, attaches and uploads eligibility documents, and submits it online. College/university server checks his/her academic, enrollment, and performance credentials and forwards his/her application online to district sanctioning authority. District sanctioning authority ascertains the eligibility documents and sanctions the student claim which goes to line department. The line department collects all claims; checks authenticity of district sanctioning authority, university/college affiliation and recognition status, and the upper limit of amount claimed; and finally checks the attendance performance from the linked university/board server and sends the claims for releasing payment to the Department of Social Justice. The block diagram of new framework is shown in Fig. 5. The Department of Social Justice sends the money to UID linked bank account of the student through Internet banking as the account number and IFS code data in the student application. Many students are doing multiple times same activities year after year till they pass. This kind of big data generated ranging from admission, claiming financial assistance, attendance, performance, etc. is stored in mongo cloud and available for decision-making and analysis by the line department and the Department of Social Justice for arranging funds, estimations, budgeting, and other decision-making analytics. MongoDB data base architecture in the new framework is more secure than MysQL based old data base model where lot of server memory and some manual processing of re-verifying the performance of student was required also manual deleting fake/duplicate claims and there were delays in releasing the scholarships and financial assistance to the students account [5].

Fig. 5
figure 5

New framework of computerization of Punjab Technical Education

2.8 Hardware and Software Specifications

The new data model require only a server and the application software giving access to mongo cloud platform, which can be hired for need-based memory requirements. In the present case, existing national informatics (NIC) server is sufficient for controlling the activity. The NIC server hosts the software application controls. There is no need for adding more hard disk memory or other memory which may become expensive year after year. The application software shall be connecting all the existing servers like mongo cloud, university/board server for student performance query, Internet banking, and (unique identification authority) UID server to student mobile phones, laptops, or tablets. For this software application, the student can install on his/her mobile phone or use laptop to access the application from the Internet using normal browser. All the data can be added and processed simultaneously using the application interface. The hardware parallel processing diagram is given as simple illustration in Fig. 6.

Fig. 6
figure 6

Parallel processing layout depicting jobs of all four servers connected to cloud

In the following section for developing a computer and mobile application, the basic algorithms for importing existing data to MongoDB are given.

3 Importing Data to MongoDB and Comparison

3.1 Importing CSV File into MongoDB

Create a folder on disk C, c:\importMongo, then download the file “ImportDataMongo.rar” from Google Drive, then extract file ImportDataMongo.exe from “ImportDataMongo.rar” in the folder c:\importMongo, then copy here the CSV file which you want to import to Mongo in the folder c:\importMongo, then launch command prompt, and change folder to c:\importMongo.Run the file c:\importMongo\ ImportDataMongo.exe

Note: the CSV file after import will be moved to the folder c:\importMongo\Archive

3.2 Checking the Imported Data in MongoDB

Launch the application Compass from MongoDB. Click on Sample_StdRec, click on stdRecords, and click on Table.

3.3 Comparison of New Framework with Other Database Systems

The field of education is gaining insight from large volumes and variety of real-time student data. Educational institutions are generating huge volumes of data, from grades or test scores to enrolment numbers and scholarships [6]. The big data paradigms are needed in today’s world to support data mining approaches for increasing the efficacy of educational institutions [7]. The usage of MongoDB platform for data storage and analyzing educational data is proposed. Modern-day open-source tools like MongoDB coupled with cloud technology are being used to test real data samples on a real-time basis for analysis of students’ scholarships data using graphs and charts of the realistic data, and comparisons thereof with other conventional technologies are carried out. These databases support frameworks like MapReduce for processing of large amounts of data in parallel fashion. The MapReduce framework deals with data mapped on distributed file systems, with intermediate data being stored on local disks and can be retrieved remotely by reducers [8]. Google’s proprietary MapReduce paradigm reads and writes to the Google File System, i.e., GFS. But recently certain platforms like MongoDB, Apache Hadoop HDFS, Hive, Bigtable, HBASE, etc. have emerged to store large amounts of data. MongoDB is useful for storing educational data as this is NoSQL, open-source document-oriented database system developed by 10Gen company. MongoDB stores structured data as JSON-like heterogeneous documents with dynamic schemas, and it scales horizontally. It also has a functionality of querying database and suitable for storing educational data due to its scalability and flexibility in structural format for storage. The platform is useful for content management and delivery and is attractive due to features listed below:

  1. (a)

    Data are stored in the form of JSON style documents and uses simplified JavaScript engine.

  2. (b)

    It supports GridFS for storing data.

  3. (c)

    MongoDB is a document database in which one collection (i.e., data store) can hold a variety of documents. Number of fields, content, and size of the document can be different from one document to another.

  4. (d)

    Conversion of application objects to structural format of database objects not needed.

  5. (e)

    No complex joins, as in traditional database systems.

  6. (f)

    MongoDB supports dynamic queries on documents using a document-based query language.

  7. (g)

    MongoDB is easy to scale.

  8. (h)

    Uses internal memory for storing the (windowed) working set, enabling faster access of data.

  9. (i)

    Index on any attribute could be made and fast in-place updates on data.

  10. (j)

    It supports replication, shredding, and high availability.

4 Results and Discussion

Punjab Technical Education is providing financial assistance to underrepresented students for post-matric scholarship (PMS) scheme. This scheme enables free education for scheduled caste (SC) students and other backward class (OBC) students, whose parent’s annual income is less than INR 250,000 and INR 100,000, respectively, and their minimum education in each case 10th standard high school. As per the schedule of the Department of Social Justice and Empowerment of Minorities, which is an implementing department, the students can apply for financial assistance every year. The payment is made directly to the UID linked bank accounts of students/institutes by the Department of Social Justice and Empowerment of Minorities, Punjab. The data for eligible students is processed by Punjab Technical Education department (DTE) which is designated as one of the line department. Other line departments are Department of Medical Education, Department of Higher Education, and Department of School Education.

The following consolidated data table shows year-wise financial assistance claimed by underrepresented students belonging to scheduled caste (SC) whose parents income is less than INR 250000 and other backward classes (OBC) whose parents income is less than INR 100000. The data shown is for Punjab Technical Education (DTE). Discussion on the data shows a lot of money is being disbursed to the students, and there are considerable no. of students applying for such financial assistance every year. Similarly students of other line departments like medical education, higher education, and school education are also applying for the same as all the students seeking any kind of education are eligible to apply if they satisfy general eligible conditions. There is a risk of duplicate claims as same student may be applying with other line departments as the data shows up in the last column of table. Manual and conventional data management frameworks were not fruitful as financial implications were involved (Tables 1 and 2).

Table 1 Year-wise SC students and claims in INR
Table 2 Year-wise OBC students and claims in INR

Also there were data security risks as UID numbers and bank accounts were part of data of personal information of the students.

5 Conclusion

It is successfully concluded that the new framework for computerization of financial assistance to unrepresented students will help in weeding out duplicity of claims with other scholarship schemes of the State of Punjab Technical Education and similar schemes of the country as UID server authenticates the student ID before application form opens up. It will also help in asserting the student performance like checking attendance and grades from university server simultaneously without any delays whatsoever. It also helps in transparency in processing the student claims as there is no manual interface with students and authorities. It also helps the Punjab Technical Education with increased data security and authorized access of data due to capabilities of using MongoDB-based tool and saving lots of hardware memory space as the cloud technology is used and need-based cloud server can be hired. Considerable improvements in data query times can also be achieved. This may save them cost and time apart from avoiding possible frauds and scams. In the future, students may use only smart phones for all kinds of educational activity so this framework will come handy for them. Also in the future, the usage of combinations of various platforms like Hadoop, MongoDB, Cassandra, etc. and parallel programming models like Hadoop, MapReduce, PACT, etc. for various data analytics techniques could be explored to accelerate the analysis of educational data. This will help in building scalable models in the field of education and may provide a better scope of improvement in the field of educational analytics as unstructured data from social media networks can also be utilized to know the student interests.