Keywords

1 Introduction

A group is a structure that creates boundaries and sustains interaction amongst members. Grouping is essential in human nature specifically in learning. It is central to human lives, and it is hard to imagine human existence without a group [17]. Being in a group brings about collaboration in the achievement of set goals. Collaboration is rooted in the theory of [36] which explains the Zone of Proximal Development. Vygotsky’s theory believes in the construction of knowledge through social interactions among peers in a community. Research has shown that learners feel more engaged when they are given the opportunity to be part of the learning process [7, 17] because it is student-centered learning.

Groups, are formed when two or more people interact and influence each other’s discussion for learning and understanding learning contents more completely [10, 12]. In the case of online collaboration, it is based on the concept of knowledge construction and gradual building of knowledge through asynchronous online discussion among learners and the instructor. The instructor, acts as a facilitator who provides appropriate resources, learning activities and utilizes knowledge of learners’ personality profiles.

This paper investigates the current state of the art in the ways collaborative learning groups are formed, with a particular interest in group types (homogeneous or heterogeneous), learner characteristics, group sizes and algorithms which should inform the automatic formation of learning groups.

2 Review Strategy

This research presents a review on group formation following the guideline by [32], which outlined a practical guide on how to carry out a systematic literature review in computer science, based on the work in [18]. Whilst we are interested in automated group formation, this review will be wider, given the limited work in this area, to inspire work on automated group formation.

Research Questions. The following research questions will be answered:

  1. 1.

    Do teachers consider homogeneity or heterogeneity when forming groups?

  2. 2.

    What learner characteristics are important criteria for forming groups?

  3. 3.

    What size is recommended as an ideal learning group for collaboration?

  4. 4.

    What are the various techniques in use for automated group formation?

Literature Sources and Data Gathering. The study included papers from online databases (EEE, Springer, ACM, Inspire, Crossref, ArXiv, GVK DBLP, Pubmed, PLOS, DOAJ) and search engines (Google Scholar, CiteSeerx) supported by Jabref. A search string was constructed using the method in [32], using synonyms for keywords. The search strings used were: (a) (group OR grouping OR team OR teaming) AND (forming OR formation); (b) (group OR grouping OR team) AND (collaboration OR collaborate); (c) (peer OR peering) AND (recommending OR recommendation OR recommender OR recommend); (d) (group OR grouping OR team Or teaming) AND (size OR sizing).

A search on the titles, abstract and keywords resulted in the first set of 105 papers published from 2002–2017. The identified papers included some that did not address the purposes of this study, stored in multiple databases, or published in many sources. For selection, we applied inclusion and exclusion criteria.

Inclusion and Exclusion Criteria. A paper is included only if: (1) it contains online/conventional collaboration, or group/team formation; (2) it is published between 2002 and 2017; (3) it is duplicated or stored in multiple sources, only one copy is selected; (4) it has multiple publications, the most recent one or full version is selected; (5) it has both a conference and journal version, the journal version is selected. A paper is excluded if it: (1) is not related to education; (2) is presented in a language other than English; (3) it is only available in the form of a presentation; (4) it does not address the problem of group collaboration. The application of these criteria reduced the study papers to 48.

Quality Assessment. To assess and analyze the selected papers, a 9 item quality assessment checklist was developed and assessed as follows:

  1. 1.

    Venue was evaluated depending on the paper source: (i) For conference and workshop papers, the Computing Research and Education rankings (CORE) were used [8], with values assigned as A = 1.5, B = 1, C = 0.5, No ranking = 0. (ii) For journal articles, the Journal Citation Report (JCR) was used which reports citation data [16]. Journals are ranked as Q1-Q4, with values assigned as Q1 = 2, Q2 = 1.5, Q3 = 1, Q4 = 0.5, No JCR = 0.

  2. 2.

    Other items such as if paper had been cited and whether it was relevant to each research question were rated as Yes = 1, No = 0.

Papers with an overall quality assessment score greater than 5 were included. At the end of this stage, a list of 21 papers were selected: S01 [3], S02 [5], S03 [6], S04 [9], S05 [20], S06 [21], S07 [25], S08 [29], S09 [31], S10 [34], S11 [37],S12 [38], S13 [1], S14 [2], S15 [4], S16 [13] S17 [15], S18 [22], S19 [23], S20 [26], S21 [30].

3 Results and Discussion

Research Question 1 - Which Group Type Is Considered (homogeneous/heterogeneous) by Teachers/instructors When Forming Learning Groups? S01, S02, S06, S13 S04, S05 S17 and S18 advocated the formation of heterogeneous groups which is in line with the study of [15, 39]. Only S01 discussed homogeneous group formation which focused on language preference. Study by [33] also noted that homogeneous groups are less stigmatized.

Research Question 2 - What Learner Characteristics Are Considered as Important Criteria for Forming Collaborative Learning Groups? Most reviewed papers focused on only one learner characteristic for group formation. The exceptions are S01, which considered gender and language preferences, S20 which considered interest and background for group formation and S17 which considered personality traits and performance feedback. While [27] advocated for group collaboration with interest in combination different learner characteristics. S02, S13 and S16 considered complementary skills of strong and weak learners. S02 suggested that: (i) All members should be expert in one of the identified complementary skills; (ii) Only one member can lead the team. This is like S07, which proposed to combine learners who are more knowledgeable with those who are less knowledgeable but did not mention how this knowledge will be determined. S04, S05 and S18 considered learning style as a measure to bring learners together. However, the use of learning styles is controversial as reported in [11]. S10 considered feedback as a criterion for group formation, maintaining that the quality of previous collaboration is important when forming a group. S11 and S12 mentioned diversity but were not specific on the area of diversification. S17 and S19 proposed personality traits as criteria for forming a group. Research by [24] shows that personality trait is an important factor in collaboration. S02, S03 and S13 proposed getting learners’ team work profile using a questionnaire.

Research Question 3 - What Group Size Is Recommended as an Ideal for Learning Collaboration? S05, S11, S12 and S15 suggested small groups without stating any number to constitute a group. S06 suggested that the size should depend on the tutor’s choice. only S15 mentioned specific group sizes of 3, 5 and 7. This is supported by the Ringelmann effect (as mentioned by [19]) who noted that members become less productive as group size increases.

Research Question 4 - What Are the Various Techniques in Use for Automatically Assigning Learners to Groups? In S01 and S21, a binary integer technique which takes the values of 0 or 1 was proposed. Data mining was proposed by S04. There are many types of data mining techniques which are used to explore and analyze large data set in order to discover meaningful patterns as noted by [28, 35], but S04 did not specify the type they used for grouping. S05 and S11 used genetic algorithms in group formation problem. This consists a set of students S and a set of groups G. The goal is to allocate all learners in S to a group in G, such that the groups are as heterogeneous as possible. The genetic approach in [14] shows that learners are drawn from the population to produce the fittest groups by changing individuals to form better groups which takes different learner characteristics into account.

S06, S08, S09 used an approximation algorithm. In S06, group allocation is made by finding individuals to act as leaders for each group by minimizing a leadership cost function, and then adding individuals to the groups by minimizing the communication cost function (using Greedy Search). The user provides feedback on the resulting groups in terms of which learners to keep in the groups. The algorithm is run again, till the user is satisfied. In S08, a group allocation is made using learners activity ratings; learners with similar preferences are put together. This approximation is evolved into a final group allocation with the desired number of groups. S09 uses backtracking.

S07 used a semantic algorithm, which maximized the diversity in knowledge in the groups. Artifacts (such as essays) produced by learners were analyzed to extract knowledge of each learner. The learners concepts were aggregated into a unified data model, and used to calculate diversity. S10 is based on a group technological approach where similar characteristics are identified and grouped together to take advantage of the similarities. The input data is composed of two matrices: (1) learner characteristics compatibility and (2) assignments of the characteristics to learners. A clustering approach is then used to form groups based on these matrices. S14 used a Bayesian network. Initially learners were divided into disjoint teams. After every activity, learners evaluates their peers by stating the most predominant role of each teammates. At each iteration, Bayesian learning was employed to update the probability for a learner given the evaluation history, these probabilities are then used to form the next teams.

Finally, S17 and S20 used Ant colony optimization and Particle swarm optimization respectively. The first is inspired by the collective foraging behaviour of specific ant species. The objective of the algorithm is to maximize the heterogeneity of all groups based on the Goodness Heterogeneous values of all groups. In S20, the particle swarm optimization technique is used. In this technique, each particle has: (1) a current position in the search space, (2) a current velocity, and a personal best position in the search space. During each iteration, each particle in the swarm is updated using (1) and (2). In S20, each particle represents a distribution of learners over groups.

4 Conclusions and Future Work

This paper provided a systematic literature review on group formation for collaborative learning, as a first step towards automated group formation by a computer agent for group collaborative learning. In the light of the findings, the research was able to identify that (1) The reviewed papers have not specifically considered which of the learner characteristics are considered important when forming a group but tended to focus on a particular characteristic. (2) The reviewed papers did not mention an ideal size to consider when forming a group. (3) The reviewed papers used a wide variety of algorithms with no studies to compare the relative effectiveness of such algorithms. Our future studies will use a mixed method research method with triangulation to determine which learner characteristics to combine to achieve effective collaborative learning groups.