Keywords

1 Introduction

As students then and educators now, we are often in the position of following the footsteps of our predecessors in a mission to spread knowledge and educate the next generations. What may start as one of the most exciting professions soon becomes a more significant endeavour that we may have hoped to deal with. As this paper will mainly focus on, university education requires continuous work, which often leaves us with little to no time to improve our coursework beyond the standard.

As we move towards education as a commodity that can be accessed across any media, more questions arise about whether or not standards can be maintained to engage with the ever-increasing student population.

An increased number of students have created burdens on the system, reflecting on the quality of life of educators. Watts and Robertson in [1] identify three main characteristics, emotional exhaustion, depersonalization and dissatisfaction, as critical indicators of burnout syndromes in universities and how this impacts not only the personal life but also the quality of teaching.

With these critical findings, how can we guarantee quality and also an exceptional student experience? Under this pressure, the ideals of young educators fall back into ‘provide the minimum required standard’, or the comfortable, one method fits all.

This has motivated us to go beyond these thoughts and find ways to make students’ lives more rewarding and educators’ role simpler by applying an adaptive framework that only requires to be developed once and updated less, which can provide a higher level of study experience to our young future professionals. In creating our novel approach, one of the key features was implementing an assessment system based on flow theory. Csikszentmihalyi [2] - the creator of this theory, explains that “there exists a state of mind called FLOW, where the user’s engagement and learning are maximized, and that happens when the task ahead is of a well-adjusted difficulty for the user, not to seem too easy, nor too difficult.”

This theory is vastly used in creating artificial intelligence for video games to keep the player engaged, making the game neither too difficult nor too easy. We adapt this to allow us to create a personalized experience for each student. Using this method, students always need to face a task ahead of the right difficulty, as we assume that all students are different and learn at a different pace. This way, students who are slower than others do not face tasks so tricky that makes them drop out, and those who are faster are not bored because the tasks are too easy for them. How we implemented, it is better described in the experimental methodology.

In this paper, we have taken a simple approach, by our admission, but further studies will provide more significant insights. Furthermore, as proved by our experiments, we are very optimistic that this is only the foundation of a long-term project which will boldly target changing how we assess students today.

Fig. 1.
figure 1

Flow theory in practice

2 Student Engagement and Gamification

2.1 Engagement and Learning Styles

Student engagement is also defined as “students’ involvement with activities and conditions”, which aim to facilitate high-quality learning [3]. The improvement of student involvement has been one of the primary missions and challenges for higher education regardless of educational formats. While student engagement is derived from many underlying factors, many teaching approaches (e.g., active learning) have been adopted and suggested to either build or enhance student engagement in various higher education fields (Fig. 1).

Given the differences in personality types, professional/educational experience and expectations, and adaptive competencies, individual students have their ways of gaining knowledge and skills [4]. This opinion may raise the question of considering student learning styles to design and improve learning structures and courses. Where several studies suggest the use of learning style theory as a potential tool to assist students in improving their learning performance [5], there are concerns toward the course customization based on learning styles, mainly due to problems of measurement [6].

According to an AUSSE report, ‘appropriate levels of intellectual challenge along with sufficient education support’ play an essential role in increasing student involvement in their work, and this further gives a positive effect on their learning outcomes [3]. Given the generic learning activities and assessment, it often relies on the individual class tutor or unit chair to generate the ‘appropriate levels of learning challenges’, which could be challenging for many tutors and unit chairs. This challenge may lead to the hype of using gamification in many educational fields.

2.2 Gamification in Education

The use of game elements/features has been one of the educational trends for the past few years. Although the focus of studies is different, some experimental studies identified the positive impact of using game elements/features in student motivation, attention and learning performance [7]. This positive outcome may have a close relationship with the characteristics of games. According to the literature, one of the most well-known characteristics is the freedom to fail. This approach reduces the fear of failure in learners’ experiment process and also resulted positively in student engagement [8]. Besides, the ability to provide frequent and immediate feedback could be beneficial [9], considering the practical restrictions in providing frequent feedback in a classroom setting. Lastly, it also allows adjusting learning activities based on the progression of individual learners, which is known as a Dynamic Difficulty Adjustment system [5]. These characteristics are often discussed as the major benefits of using gamification in education.

3 Dynamic Difficulty Adjustment (DDA)

Much research has been done that points to the fact that computer-based learning is highly effective in comparison to traditional learning and that too from a very initial stage from an individual’s learning and development [10]. Yien et al. [11] provides an experimental group of sixth-grade students with a game-based learning curriculum and establishes that it was more effective than the traditional learning curriculum.

Wang and Chen [12] highlighted a fundamental distinction between performance and engagement. According to their research, individuals performed better when they were initially given a game to clarify their concept, followed by a challenge game. However, they showed less engagement or flow as described by Mihaly Csikszentmihalyi [2]. The primary reason could be that participants were asked initial concept clarification questions that required them to differentiate between important concepts and point out examples that might digress individuals from the immersive experience and cause boredom. Therefore, it seems crucial to find the right balance between performance and engagement while developing computer-based learning platforms.

Research supporting computer-based learning has been carried out in recent times. However, the purpose of our study was to go one step further than that and apply Dynamic Difficulty Adjustment to computer-based learning platforms. Most of the research done in Dynamic Difficulty Adjustment has focused on multiplayer games. However, research is now also being done to apply DDA to serious games [13]. Serious games are games that are not designed for entertainment purposes but for the education of other means. In both scenarios, most of the difficulty ‘adjustment’ revolves around the modification of specific parameters and game scenarios to ensure that the players do not get bored or frustrated while playing the game [14]. Our research is the first of its kind as we aim to apply DDA on purely educational platforms to be used in higher education.

Besides ensuring that users do not get disengaged from the tasks, research has also been conducted to show that DDA techniques can assess the users’ current state and adapt to improve performance. This feature helps students maximize their work productivity as well [15]. Despite all these qualitative experiments, it has also been claimed that player expertise has a considerable influence on the perception of the level of difficulty [14]. There were promising results concerning adaptive educational games adjusting their features such as task difficulty, object speed and learning content according to the current state of the player [13, 16]. In research conducted by DDA, systems can also be used to facilitate the transition of users from novice to expert [17].

One study has shown that while using DDA in gaming, an AI runtime module called the Experience Engine can dynamically create activities that are actively allocated to the players to ensure that the aims of the author or teacher are fulfilled. The individual’s profile includes various features, which include his level of skill, task type preferences, skill needs and preferences, learning styles needs/preferences [13]. For all these reasons and others, DDA is seen to provide personalized learning for the participants. Ung, Meriaudeau and Tang [15] aiming to improve the outcome as shown in Fig. 2. Furthermore, it has been reported in research that this also improves the experience for the players to get quicker performance gains and get the feeling of being in greater control when DDA is used to match their skill level [18].

Another school of thought has tried to assess the participants based on their mental state rather than their performance. This approach showed a more remarkable improvement in the performance of the participants. They seemed to be more immersed in the challenge [10, 19] tries to study when to trigger DDA in a third-person shooter game and used a unique approach of measuring players’ excitement level using an Emotiv EPOC headset to read electroencephalography (EEG). If the level of excitement drops down under a certain threshold, DDA is activated to mitigate the problem. This method addresses degraded game experience and uses a proxy for excitement level than a performance scoring level.

Ung, Meriaudeau and Tang [15] in their research proposes the design and subsequent application of a functional near-infrared spectroscopy (fNIRS)–dynamic difficulty adjustment (DDA) system. Their experiment has a total of 25 participants that undergo a control session with Fixed Difficulty Training (FDT) and one with the Neurofeedback Training (NFT) that uses the DDA system. The result showed considerable improvement using the DDA backed system. All of the above researches have opened an avenue for DDA to be used in alleviating medical disorders. This idea can be seen in one research that has claimed that DDA can play a potential role in several fields, including treating cognitive mental disorders such as Attention Deficit Hyperactivity Disorder (ADHD) [21]. They use Visio-haptic training with DDA as they claim that it is the most effective in attention training.

When we talk about educational assessment, we see that the personalization brought by DDA can help mitigate the problem of plagiarism as well [22].

Much research has been done to conclude the most viable and effective way of measuring how difficult the task is for the individual. For example, a “Challenge Function” and “Evaluation function” are two concepts introduced in research by [14]. The functions use various quantitative information from the players and assess the game state and the player’s skill level and perform the right adjustments that suit the suitable abilities. Therefore, using heuristic functions is very common in assessing the skill level of participants.

On the other hand, some people believe that difficulty adjustments are required when the individual is mentally fatigued rather than his skill level not being up to the mark. The research assesses that the drop in oxygenation level in subjects might indicate mental fatigue leading to the participant being less engaged in the task at hand. In contrast, the oxygenation levels remained almost constant by NFT subjects throughout the experiment. This finding suggests that the proposed fNIRS-DDA system aided the participants in avoiding mental fatigue [15].

Bayesian statistics may be used to dynamically predict or evaluate the difficulty of specific tasks using the participants’ performance measure. Such probabilistic techniques are more commonly applied in multiplayer games, which are mainly stats-based in nature. A neural network, k-nearest neighbours’ algorithm linear and nonlinear regression are other standard models used to assess individual skill levels and future states [20]. All these models aim to predict players’ current state and make necessary parameter adjustments to keep individuals in engaging interaction loops for a required amount of time.

While it has potential benefits, DDA does not come cheaply. Ultimately, DDA IA systems tend to take control away from the author and give it to the algorithm [20]. [4] does highlight the need for several trials of user performance to predict with accuracy; however, other research claims that since we are just aiming for the best fit of the given current information, several trials may not be necessary.

Fig. 2.
figure 2

An example on how DDA is applied to maximise the final outcome of a test.

4 Research Methods

In creating TuneIn, we wanted to make sure that we would be able to test our assumptions;

  • Assumption 1: that a study path tailored to keep the student in the flow zone (DDA student) will improve their performance, especially in the amount of content absorbed.

  • Assumption 2: Students using the standard method (STD students) will score, overall, less than DDA students.

To run the test, we selected a pool of 500 questions and problems from a Linear Algebra class divided into five levels of difficulty (division already provided by the book used to gather the test questions). To this end, we run a randomised, double-blind trial which we carried out through an app. The test consisted of a pool of 20 questions with a time limit of 2 min for every question. Every time a new student takes the test, the app chooses behind the scenes whether to use the DDA mechanism or not.

The app in DDA mode presents questions of varying difficulty, depending on the student’s current level. While the DDA mechanism is not chosen, the question is generated using a pattern from 1–5. The level is chosen randomly with an equal distribution of difficulty to prevent the student from facing a test with only hard questions or easy questions. The result will be a random selection of 4 questions per level (that leads to 20 questions in total).

To run the test and evaluate whether the DDA mode is more efficient than the random picking of the questions, we used two different models of DDA. A simpler one based on mathematics operations (called MathDDA) and a different approach based on Reinforcement Learning (called RLDDA).

4.1 MathDDA

In MathDDA, the level always starts at 1. It is updated depending on whether the last questions were answered correctly or not. Each correct answer increases the updated level by \(\frac{1}{3}\), while each mistake decreases it by \(\frac{1}{6}\). The two numbers differ, reflecting that a wrong answer is not intended to be a punishment.

The current level (the level of the question the user will face) during the test is the result of the following formula:

$$\begin{aligned} Y = round(updated level) \end{aligned}$$
(1)

4.2 RLDDA

On the other hand, RLDDA is based on a DQN feed-forward Neural Network and a custom reward function to extract the best outcome from the student.

The basic concept of this model is to train a network based on a Q-Learning algorithm to automatically select the following question with the final aim to maximize the student grade.

At the heart of Q-Learning is the function \(Q(s, a)\), which gives the discounted value of taking an action \(a\) in a state \(s\). This value is equal to the reward for taking a specific action \(a\) in a state \(s\) plus a discounted value for all the future states in which the agent will end up. Shortly is the value of picking the optimal action in a specific state, represented by the formula:

$$\begin{aligned} Q(s, a) = r + \gamma ~max_a'\Big (Q(s', a')\Big ) \end{aligned}$$

The goal of this approach is to find the optimal policy that maximises the reward function:

$$\begin{aligned} \pi (s) = argmax_a\Big (Q(s,a)\Big ) \end{aligned}$$

Where \(\pi (s)\) is the policy at state \(s\) in order to let the student achieve the highest possible score compared to his current level.

The reward function has to reward the model whether the student answers correctly and punish it if the answer is wrong, meaning that the model has picked the wrong question’s level so that the student can achieve the highest possible score. To prevent the model from converging to a local minimum presenting only low-level questions to maximize its reward, the higher the question’s level, the higher the reward/punishment has to be. As a normal Reinforcement Learning approach, the model has to have an observation space (state) and an action space described below.

State. The state provided to the DQN network is an array with the following elements:

  • Level of the previous question (from 1 to 5).

  • Question index (e.g., 4 if is the fifth question).

  • Number of correct answers.

  • Number of wrong answers.

  • Total reward achieved from the beginning of the test (that has to be also an input since it represents an estimator of the level of the student).

Observation Space. On the other hand, the actions space is a set of 3 actions:

  • 0 for decreasing the level of −1 compared to the previous question.

  • 1 for increasing the level of +1 compared to the previous question.

  • 2 for keeping the level the same as the previous question.

Network Structure. The network structure, as shown in Fig. 3, is composed of an input layer with 5 nodes, 2 fully connected hidden layers with 12 nodes each and an output layer with 3 nodes. This last layer will be responsible for output the estimated Q-value for the 3 different actions that the network can perform.

Reward System. In order to provide an evaluator that the network can use to assess the quality of its own decision, a simple reward function is implemented. This reward function aims to direct the network toward its absolute minimum, providing every action with a score (reward/punishment system). The only network objective is to maximize in the long run the value of the reward.

The reward in this experiment is a score corresponding to the coefficient of the question provided (coefficient based on the level), where the sign is positive if the answer is right (reward) and is negative if the sign is negative (punishment). The coefficient is described as follows: level 1: 0.5, level 2: 0.6, level 3: 0.7, level 4: 0.85, level 5: 1.0.

Fig. 3.
figure 3

DNN structure

Fig. 4.
figure 4

Application flow, simplified to allow users from different domains to comprehend the end to end solution.

4.3 Application Flow

As shown in Fig. 4, the application developed randomly assigns every user to the selected group and then performs the final assessment on the result. Although simple, the application developed is fully functional and was offered as a service. The participants were chosen from a cohort of people of both genders, between 18 and 23 years of age. All participants have at least completed high school, currently enrolled in a scientific course of a university (to guarantee that all the students used had already covered the fundamentals of Linear Algebra). Coming from a university background and thinking about future adoption, we have strategically decided to look into that part of the education world that is faster at implementing than others, so we picked the university target among others.

5 Preliminary Findings

After conducting our pilot experiment, we cleaned our data. We had a sample of the 99 students who completed their assessment equally split into three groups, each assigned to an approach. Every cluster of 33 students performed the test similarly, with the same time constraint differing only by the algorithm that picked the questions.

As shown in Fig. 5, the Random approach led to an average of 46 points compared to 54 and 57 of the MathDDA and RLDDA. So, the DDA approach is beneficial for the students’ outcomes.

Fig. 5.
figure 5

Visual representation of the final score in the three groups.

The second observation was that the DDA-backed cohort attempted more questions than the control group (8% more on average). The control group attempted fewer questions or even dropped out of the assessments more frequently. The main explanation could be that the students attempting their assessment on the DDA platform felt more engaged and confident. With DDA, the students never faced questions significantly too tricky compared to their actual level.

Fig. 6.
figure 6

Visual representation of the average time spent per question in the three groups.

Finally, it was also noteworthy that the DDA groups were spending less time on each question on average, with 31 and 32 s of MathDDA and RLDDA compared to 34 s of the random approach, as shown in Fig. 6. Possible explanations for that fall in line with the flow theory of learning. The student is more engaged and performs much better on the dynamic platform, adjusting the difficulty of questions and ensuring that the student remains within the flow channel.

6 Discussion

This paper has revealed some interesting insights that are bound to become the foundation for much more extensive exploration. The few elements that we have used and the assumptions we have brought forward were proven correct within the limitation of our experiment. The outcomes were relatively straightforward, limiting our current phase to simple validation.

From an early experiment, we showed how a DDA approach could improve student performance, shaping the test to its specific knowledge level and making him comfortable with the test level. Given that the user usually never faces a question too difficult by orders of magnitude, We strongly believe that the psychological factor is the key to keeping him focused on the test and confident about his knowledge.

7 Limitations and Future Research

There are several limitations to this study. Firstly, the sample size of this study is limited because it is only based on the first phase of data collection. Besides, data collection is only focused on capturing time and levels; we did not have information specific to individuals in our cohorts, but only the general information. Given the limitations, this research will be extended to gather another phase of data collection to test student engagement and performance and identify critical factors/features that play an essential role in shaping the student’s experience of using the platform.

In the future, we will include more concise information about our participants to fine-tune the difficulty levels. We will also explore different topics beyond mathematics to verify whether or not the theory still proves correct in non-stem subjects.