Keywords

1 Introduction

Developing computer-based automated techniques for assessment is referred by multiple terms in the existing literature such as Automated Grading of Essays [1], Computer Automated Scoring [2], and simply Automated Scoring [3]. The use of machine learning [4] and NLP techniques [5] for this purpose is continuously increasing. These automated scoring techniques assess various skills such as essay writing, physician’s patient management skills, dentistry assessment skills, and architects registration process [2]. Teachers spend a significant amount of time assessing students’ performance in descriptive writings such as research papers [6], articles, thesis, reports, and PowerPoint presentations. Grading such kinds of essay work is challenging because it is time-intensive and susceptible to inconsistencies and inaccuracies on account of evaluators.

Creating and delivering a convincing presentation is an indispensable soft skill that is to be imparted during graduate programs. However, evaluating presentations is a challenging task because it is often a subjective assessment, needs to comply with institute-specific rubrics, and is a time-consuming mechanical activity. To overcome these challenges, we present the design of a data-driven approach to grade students’ presentations. This approach is based on extracting the features that contribute in enhancing the quality of the presentation and grading it on presentation quality parameters and not on the content covered in the presentation. The task of grading a presentation based on content can be delegated to a human expert. The presentations are graded on two different parameters to simplify the grading. The first is based on presentation quality, and the second is on the accuracy and authenticity of the topics covered in the presentation. Our main objective is to focus and evaluate the efforts put by the students for preparing PowerPoint presentations.

Many researchers have developed techniques for automated grading of explanatory answers for various languages [7,8,9,10,11], programming codes, and research papers. These approaches use a combination of technologies to grade descriptive assignments. Automatic Grading of Essays (AGE) relies on Natural Language Processing (NLP) and Machine Learning (ML). The adoption of NLP [12,13,14,15] for AGE is driven by the motivation to handle linguistic issues such as multiple meanings of words in different contexts [16]. While machine learning techniques are employed to extract features and perform grading tasks. For example, one of the earliest approaches called Project Essay Grader(PEG) extracts a set of language-based features using NLP and uses multiple regression techniques to grade essays. Linguistic features can also be extracted using NLP and Deep learning approaches [17,18,19,20].

2 Challenges in Grading Students’ Presentations

Grading students’ PowerPoint presentations against a set of quality parameters involve checking the PowerPoint slides prepared by students with the norms set by an institute or criteria as expected in a professional presentation. With this view, when a teacher starts assessing the quality of presentations, the challenges faced by a teacher are numerous. Some of these are listed below.

  1. 1.

    Lack of Systematic Assessment Methodology: As observed in [21, 22] assessment is usually subjective, driven by knowledge and experience of evaluators, and the mindset of evaluators (e.g., harsh vs lenient raters). The use of rubrics is often suggested to overcome this subjectivity in the assessment. However, many evaluators rely on their experience to grade the students’ presentations instead of using predefined rubrics.

  2. 2.

    Absence of Automated Approaches: Creating and delivering the students’ presentation is a compulsory activity included in most undergraduate post-graduate programs. Many educators think that judging the presentation quality is a time-consuming and routine activity. But, very few researchers have employed the advances in machine learning to automate the task of grading students’ presentations.

  3. 3.

    Absence of Standardized Dataset: The development of data-driven and machine learning-based automated techniques depends on the availability of quality datasets.

  4. 4.

    No Universal File Format for Presentation File: Students create presentations using different sets of presentation software. This presentation software supports various file formats such as ppt, pptx, odp, pdf, which makes the development of automated techniques difficult, to extract features.

  5. 5.

    Evolving functionalities in Presentation Software: The presentation software such as MicrosoftOffice, LibreOffice, and GoogleDocuments continuously add new functionalities and features. The students’ presentations may be power-packed with graphical images, info-graphics, videos, hyperlinks, and more appropriated templates. Hence defining a comprehensive feature set becomes difficult.

One way to address these challenges is to define quality assessment criteria and adopt automated techniques to reduce the assessment process’s subjectivity.

3 Methodology

A machine learning model is a mapping function f that transforms input data X into output data Y.

$$\begin{aligned} Y= f(X) \end{aligned}$$

where X is an n-dimensional vector holding input data and is also known as input features. Building a machine learning model includes the following steps.

  1. 1.

    Identification of Features and Data Collection: This is a preparatory step that aims to collect the input data necessary to build the model. Further, it identifies input features and also aims to develop techniques to automatically extract features from the collected observations. Sometimes data needs to be normalized or re-scaled to bring all input features within the same range.

  2. 2.

    Feature Selection: The performance of machine learning-based models depends on the set of input features used and the correlation among them. Feature selection aims to identify the minimal number of features required for optimal performance. The Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) techniques are typically used to determine the optimal set of features needed to build the machine learning models [23].

  3. 3.

    Model Development: The model development involves two steps. In the first stage, a model is built using any one of the machine learning techniques discussed in the following section. The data set which is used to develop the model is typically referred to as training data set. In the testing phase, the performance is checked against the unobserved or unknown data.

  4. 4.

    Model Evaluation: This step aims at evaluating the performance of the developed model against parameters such as f1-score, accuracy, recall and precision. This step gives us an idea about how exactly the model responds to unobserved data points.

3.1 System Architecture

Figure 1 shows the architecture of our system designed to carry out automated grading of PowerPoint presentation slides. Solid lines show the training process of the model. Dotted lines show the testing process of our model. We have PowerPoint presentation samples in the dataset which are already graded manually by the experts. Dataset contains input feature vector of 24 features representing the quality of a PowerPoint presentation and output is the grade of a PowerPoint presentation. After training and testing model is ready to evaluate grade of the ungraded PowerPoint presentations. Ungraded PowerPoint presentation feature vector is directly given as input to the trained machine learning model, which will predict its grade.

Fig. 1.
figure 1

Grading of PowerPoint presentations

4 Data Collection

We have collected the presentation slides prepared by students to deliver a course seminar. Delivering a course seminar is a compulsory activity included in the undergraduate curricula of all Engineering programs offered by all the Indian Universities. The course Seminar is included in the curricula to develop the communication and presentation skills of students. No formal training is provided to the students on developing PowerPoint slides using any presentation software (e.g., MS Office). Students learn using this software on their own as a part of the course Seminar activity. Students select a topic on emerging technology and deliver a talk of about 20 min using PowerPoint slides.

The course seminar is typically evaluated against the efforts put by a student to prepare PowerPoint slides, coverage of the topic selected by a student, and communication skills of a student. We aim to check the PowerPoint slides prepared by students for presentation quality and not for the topic and communication skills. We have collected about twenty six PowerPoint presentations. The collected slides are used to extract the features required to build the machine learning model. PowerPoint slides as well as dataset generated after features extraction have been made available on GitHub (https://github.com/jyotiborade/Presentation_grading/).

5 Feature Identification and Selection

In grading students’ presentations, the goal of feature engineering is to identify attributes or input variables that can help us evaluate the quality of the presentation. For automatic grading of essays, many researchers have either used linguistic features that may be lexical, syntactic, semantic or they used automated techniques based on Deep Learning to extract relevant features. We have identified the following set of features to assess the quality of slides included in a presentation. These features include:

  1. 1.

    Bullet Features: These features capture information about the number and types of bullets used by a student in a presentation.

  2. 2.

    Text Appearance: These features mainly capture the information about the appearance of text on the slide. We presume that a diverse presentation in terms of text color, font size, and font type. Usually, these features aim to attract the audience’s attention.

  3. 3.

    Image and Animation: This set of features captures the information about the use of images and animation. A presentation that includes images, diagrams, graphs, and charts usually conveys information effectively.

  4. 4.

    Text Formatting: These features capture information about the formatting of the text such as whether there are lengthy paragraphs included in the presentation and whether hyper-links and inter navigational links are provided to smoothly move across the presentation.

  5. 5.

    Output Features: We have included two different types of output variables. The first variable indicates whether the presentation is of an acceptable standard or not. The second output variable further grades an acceptable presentation in grades such as Excellent, Very Good, Good, and Fair.

Table 1. Features for bulleting and text appearance
Table 2. Features for images, animations, and text formatting

Many programming languages such as Java, JavaScript, and Python provide programming language libraries to process PowerPoint presentations created in PPT format. For example, Apache POI is the Java API for handling Microsoft documents. While \(Python-pptx\) and \(nodejs-pptx\) are programming interfaces to handle PPT files in Python and Javascript, respectively. Features can be extracted from PowerPoint slides using \(Python-pptx\) library [24]. Deep neural network-based approaches such as AutoEncoders can also be developed for automatic feature extraction; these approaches are preferred for feature engineering. In contrast to the linguistic features extracted for automatic grading of essays, the features mentioned above focus on the non-technical aspect of PowerPoint presentations. As shown in Table 1 and 2, we have used features about bullets, image, font, colours, hyperlinks, header footer, animation, etc., which captures various aspects that enhance the quality of PowerPoint presentations. The output features capture a teacher’s evaluation of PPT slides with or without knowledge of features, which we are using to develop automated techniques.

6 Model Development

This section briefly reviews the various algorithms used to build classifier models and the implementation of our approach.

6.1 Classifier Algorithms

The grading of PowerPoint presentations can be treated as a classification problem. Binary classification can be used to differentiate PowerPoint presentations into two broad categories labeled as acceptable and non-acceptable satisfying presentation norms. Further, the technique of multi-class classification can be used to grade the PowerPoint presentations among grades such as Excellent, Very Good, Good, and Fair. Following machine learning algorithms are used in our work to assess grade of PowerPoint presentations.

  1. 1.

    Decision Tree (DT): It is a popular classifier modeling technique, representing classification problem in the form of a decision tree. The tree has two types of nodes viz, decision nodes and leaf nodes. The leaf nodes indicate predicted value of the class for a label in our case these are: Excellent, Very Good, Good, and Fair for multi class classification. Decision nodes test features for some specific criteria useful to designate the class labels. Here we have used Gini Index for the construction of decision tree.

  2. 2.

    Logistic Regression (LR): The logistic regression technique is derived from Linear regression. In linear regression, the relationship between input and output variables is modeled as a linear function described below.

    $$\begin{aligned} Y= \beta _0 + \beta _1x_1 + \beta _2x_2 + ...+\beta _nx_n +\epsilon \end{aligned}$$

    In logistic regression, a sigmoid function is used to transform the output values in the range between 0 to 1. The class label 0 is assigned when the output value is less than 0.5 and class label 1 is assigned when the output value of a sigmoid function is above or equal to 0.5. The sigmoid function is:

    $$\begin{aligned} g(z)= \frac{1}{1+e^{-z}} \end{aligned}$$

    where

    $$\begin{aligned} z=h_\beta (x)= \sum _{i=0}^{n} \beta _i x_i \end{aligned}$$

    We have used a ‘liblinear’ solver for fitting the model, the maxiter is 100, a penalty is 12 that decides whether there is regularization or not. It uses L1 Regularization.

  3. 3.

    Multi-Layer Perceptron (MLP): It is typically used when input and output variables are related through nonlinear function. Our training data is small, hence We have used a ‘lbfgs’ solver. We have used regularization parameter alpha with value ‘\(1e-5\)’, the number of hidden layers with hidden neurons is (18,8).

  4. 4.

    Support Vector Machine (SVM): It builds a classifier that maximizes the separation between data points and decision boundaries. For the implementation of SVM, we have the penalty term C with value 180 to avoid misclassification. We have used a ‘linear’ kernel to classify data points.

  5. 5.

    K-Means Clustering (KM): It groups set of objects such a way that objects in the same group are more similar to each other than other groups. We have used this classifier for binary classification. Hence the number of cluster parameter is set to 2.

  6. 6.

    Naive Bayes Classifier (NB): It assumes that input features are independent of each other, and they are numeric or continuous data types. It uses Bayes theorem to assign class labels \(C_{k}\) given a data observation x. We have applied default parameters for this classifier.

    $$\begin{aligned} p(C_{k}|x) = \frac{p(C_{k})p(x|C_{k}))}{p(x)} \end{aligned}$$

6.2 Implementation

We have implemented a proof of concept classifier model. A data set of 26 PowerPoint presentations has been collected. We have manually extracted all the features mentioned in Tables 1 and 2. Two teachers have separately validated the correctness of extracted features. The PowerPoint presentation slides have been graded separately by two different evaluators to whom we have not disclosed the machine learning models’ features. They were agreed upon grade has been recorded in the data set as output label. To reduce the similar correlated features, we have used linear discriminant analysis (LDA), a feature reduction technique. We have implemented a machine learning model using the Scikit-learn library provided by python. From the dataset of PowerPoint presentations, 70% presentations are used for training and 30% presentations are used for testing.

7 Model Evaluation

Quantitative metrics such as F1-score, Precision, Recall, and Accuracy are used to evaluate classifier systems. To define these evaluation metrics, true values and predicted values play an important role. In our case, a true value is a value of the class labels assigned to the output variables acceptable and grade. These values are assigned by a human evaluator. A predicted value is the class label assigned by a particular classifier. The first step to evaluate any classifier’s performance is to prepare a table called confusion matrix. It represents the number of true values and predicted values in the form of a matrix as shown in Table 3. Due to lack of space, it is not possible to present the confusion matrix of each classifier. As shown in Fig. 2, we have tried to explain a general format of a confusion matrix for the Decision Tree classifier’s confusion matrix.

Table 3. A general format for confusion matrix of binary classifier
Fig. 2.
figure 2

Decision Tree binary classification confusion matrix

The output of a classifier model can be divided into the following four types which can also be mapped into 4 cells of a confusion matrix as shown in Table 3. These are:

  1. 1.

    True Positive (TP): These are the total number of correctly predicted positive values. In our classifier value of TP is 10.

  2. 2.

    True Negatives (TN): These are the total number of correctly predicted negative values. In our classifier value of TN is 20.

  3. 3.

    False Positives (FP): When actual class is 0 and predicted class is 1. In our classifier value of FP is 0.

  4. 4.

    False Negatives (FN): When actual class is 1 but predicted class is 0. In our classifier value of FN is 0.

$$\begin{aligned} Precision = \frac{TP}{TP+FP}=\frac{10}{10+0}=1 \end{aligned}$$
$$\begin{aligned} Recall = \frac{TP}{TP+FN}=\frac{10}{10+0}=1 \end{aligned}$$
$$\begin{aligned} F1-Score ={2 \times \frac{Recall\ \times \ Precision}{Recall + Precision}}= {2 \times \frac{1\ \times \ 1}{1+ 1 }}=1 \end{aligned}$$
$$\begin{aligned} Accuracy = {100\times \frac{TP+TN}{TP+FP+FN+TN}}={100 \times \frac{10+20}{10+0+0+20}}=100 \end{aligned}$$

Precision, Recall and F1-Score metrics for all the classifiers are shown in Table 4 and Table 5. Decision Tree is having highest value i.e. 1 for all these metrics.

Table 4. Precision, recall, F1-score for multiclass classification
Table 5. Precision, Recall, F1-score for Binaryclass classification
Fig. 3.
figure 3

Accuracy for Multiclass and Binaryclass classification

The bar charts in Fig. 3 show the accuracy of various machine learning classifiers. Logistic regression, Naive Bayes, Decision Tree, and Support Vector Machine show good performance while predicting the class labels of presentations. The Decision Tree-based classifier predicts output class with 100% accuracy in both types of classification. The MLP, SVM, Naive Bayes classifier gives an accuracy of more than 80% in both multiclass and binary class classification. It shows that the features we have considered for grading students’ presentations are appropriate and give an acceptable level of performance in the classification. Also, grade of PowerPoint presentation predicted by Decision Tree is more relevant compared to other classifiers. We have used Kmeans only for binary classification and it shows poor performance in comparison with others while predicting class. This may be due to the use of a small-sized dataset.

8 Conclusion and Future Work

In this paper, we have offered an approach to assess the students’ presentation skills. Students demonstrate their presentation skills by preparing and delivering a PowerPoint presentation. To simplify developing an automated technique of evaluating such presentations, we separate technical content included in the PowerPoint presentation from the presentation quality manifested through various functionalities supported by presentation software. To demonstrate the approach, we have identified a set of useful features to determine the presentation quality. We have developed a small data set to enable the development of machine learning techniques. A data-driven approach to assess the presentation skill is demonstrated through various prototype classifiers. Decision tree predicts grade of the PowerPoint presentation with 100% accuracy.

In the future, we are going to take technical aspects like speaker’s volume, communication skill, time and content management, topic selection etc., into consideration for automatic grading. Also, the performance of various classifiers needs to be fine-tuned.