1 Introduction

Since early 1960s, children programming has been a wide field of research. Drawing on Papert’s constructionism ideas [1], a number of programming languages for children and novice users have been developed [2]. At the beginning, the programming tools were based on text and graphical user interfaces (GUIs) and later on tangible user interfaces (TUIs). Using keyboard or mouse, text and GUI systems allowed users to create programming structures either by typing commands or by dragging and connecting icons on a computer screen. Later, taking away keyboard and mouse, other developers based on TUIs created languages where children could acquire programming experiences by simply interacting with physical objects [3, 4]. Some of the most influential languages of this kind include AlgoBlock by Suzuki and Kato [5], Electronic blocks by Wyeth and Purchase [6] and finally, Quetzal–Tern by Horn et al. [7].

Despite the various design efforts and the theoretical frameworks [3] related to TUIs and children education, there is a lack of (a) empirical research investigating the possible advantages of the TUIs against graphical interfaces [8, 9] and (b) design guidelines and available tools that combine tangible and graphical programming capabilities [10]. Thus, this work presents a series of design guidelines aiming to offer ideas and directions to future designers, and in addition, comparing tangible and graphical programming tools provides empirical evidence showing the possible performance advantages and disadvantages of tangible against graphical programming for children.

2 Background

TUIs in general may be considered as physical objects whose manipulation may trigger various digital effects, providing ways for innovative play and learning for children or novice [11]. Developers, taking advantage of this playfulness, created applications in fields such as quiz representation [12], storytelling [13], home automation [14] and introductory programming for children [4, 15].

Programming appears a quite challenging process for beginners of all ages [2]. Users have difficulties not only in learning a rigid syntax and commands with confusing names, but also in using the programming environment as well [16]. Reducing learning difficulties originating from the programming environment is considered to be one of the TUIs advantages, since TUI users do not have to familiarize with keyboard or mouse, while they are naturally skilled to direct manipulate physical objects, such as cubes and puzzles [17]. Consequently, it is believed that tangible systems reduce the required cognitive effort to learn how the system works, and users’ attention may be exclusively devoted on the programming itself [18].

The systems that seam to have influenced the development of tangible programming are (a) AlgoBlocks [5] that was the first that introduced the cubic-shaped commands, (b) Tangible Programming Brick by McNerney [19] that was the first that clearly introduced the use of parameters, (c) Electronic Blocks [20] that let children build programmable robots or simple mechanisms by simply snapping together the block primitives and (d) Tern [21] that was the first that employed a scanning system in order to identify the connected commands and so create the program sequence through an automatic identification.

Although pioneering work has been already done in the domain of construction of such systems, there is still a lack of available tools [10] and design guidelines.

2.1 Comparison studies between graphical and tangible programming languages

A number of studies attempting to compare tangible against isomorphic graphical systems in various domains have reported controversial results, and this strongly emphasize the need for further research in order to examine the circumstance under which each type of interface offers more benefits in a real-class context [22, 23]. Particularly, in the domain of programming, there is a notable lack of such empirical works [3]. Comparative works on tangible and graphical systems with analogous features (such as similar shape, appearance, functionality) produced interesting findings, focusing primarily on fun and enjoyment [24]. Kwon et al. [10] performed a comparison between Algorithmic Bricks with Scratch [25]; however, the systems were not isomorphic. Horn et al. [26] compared a passive tangible against a similar graphical programming language in informal learning setting at the Boston Museum of science. This study revealed some advantages of the tangible programming language against the graphical. For instance, it was found that the passive tangible language was more attractive and supportive for active collaboration.

Finally, Sapounidis and Demetriadis [22] explored children’s opinions regarding tangible and graphical programming. The results showed that the tangible interface was considered more attractive especially for girls, more enjoyable and finally easier to use only for younger children.

Even though tangibles are believed to be more efficient than GUIs, there is limited research that systematically explores the cognitive and social advantages of TUIs compared to GUIs [8, 27]. In particular, the impact of tangible environments and the conditions under which the handling of tangible objects can be more efficient for children or novice, in various domains, such as programming, have not been studied sufficiently and remains mainly unexplored [2, 18].

2.2 Research motivation and questions

Taking into account the limited empirical research [9, 28] on TUIs and particular in relation to quantitative outcomes [29], the present work sets a twofold objective. First, to illustrate a series of design principals in order to offer new ideas and directions to designers of such tools; second, to provide empirical evidence on the possible (dis-) advantages of TUIs (compared to isomorphic GUIs), when used for children introductory programming; The measurements and data analysis aimed to address children’s performance on predetermined tasks and during free interaction. In detail, regarding performance upon tasks the research questions are:

  • RQ1. Do children perform significantly better with one specific interface, when task-solving time is considered?

  • RQ2. Do children perform significantly better with one specific interface when number of errors is considered?

  • RQ3. Which interface is more appropriate for efficient debugging when an error occurs?

Regarding free interaction session, the research questions are:

  • RQ4. When children may freely use either a GUI or a TUI programming interface, do they engage more with one specific interface?

  • RQ5. When children may freely use either a GUI or a TUI programming interface, do they produce longer programs with one specific interface?

  • RQ6. Do children explore more commands and parameters when working with one specific interface?

  • RQ7. When children may freely use either a GUI or a TUI programming interface, what is the complexity of the code they develop?

Answers to the above questions would provide insight into the possible TUIs versus GUIs (dis-)advantages in introductory programming for children.

3 The tools implemented

3.1 Design consideration guidelines

Although TUIs is generally believed to be easier to learn and use, it is more difficult to design and build than other traditional interfaces [30]. In designing and constructing our system, we took into account the suggestions, the advantages and the deficiencies of the previous design approaches as they appear in the related literature.

In detail the issues we addressed and took into account were:

Both tangible and graphical interface

The development of a system that offers both tangible and graphical programming capabilities may give the possibility of a fluid and balanced transition between TUI and GUI, in relation to the age and user experience [7]. Furthermore, the possibility to program with two isomorphic environments gives a unique opportunity to study the advantages of tangibles in relation to GUIs (e.g., [8, 31]).

Cost and portability

Because of cost and complexity issues, the construction of tangible systems for programming is considered to be resource consuming for research purposes [30], a fact that might explain the lack of evaluations in real classrooms settings [10]. To reduce the cost and simultaneously achieve high portability, we developed an active tangible programming language based on reliable microcontrollers and low cost D9 and D25 connectors. This way the programming activity can be done in any surface allowing to freely develop the activity in a more dynamic way [32].

Reduction in the physical and conceptual distance between input and output

To eliminate the distinction between tangible actions and their effect, which are the results of the implemented program, both actions and effects should be presented in the same physical place [15, 3235]. To achieve a better physical coupling between performed actions and their effects, in our approach, users program in the real environment with real cubes and inspect the outcome of their program in the same physical space with a real robot [36, 37].

Collection of commands and parameters

Supporting procedural programming with an enhanced set of commands and parameters makes system usage interesting and challenging to even elder children [16]. In order to satisfy the above, our system was designed to support a plethora of commands beyond just the “move” commands and more parameters than just numbers; moreover, it allows users to familiarize with concepts such as procedure, repetition and condition [2]. Furthermore, the system introduces the functionality to save and reuse the program code which had been created by someone else [38, 39] or exchange it with other users [40].

Interaction with the user on the interface

An increased user–system interaction is a desirable aspect in this kind of systems [16, 34]. For this purpose, we developed two additional functions; the first informs the user about the internal state of the system and second informs the user about any possible syntax error. Both actions occur with appropriate indications on the interface and so no external screen or other means are needed.

Physical properties and characteristics

The development of a system that has properties and physical characteristics which may be useful during interaction. In our approach for the tangible subsystem, we used a combination of connectors in order to set the appropriate constraints on the users [41]. The goal is to prevent users from plugging a parameter cube in the place of a command cube or connect the blocks in the other way round. This physical property may reduce the users’ cognitive load because the interface itself “informs” users for what they should not do. Finally, it seems that the inclusion of some material characteristics in the interfaces (temperature, shape, color, texture, sound, etc.) may bring the interfaces closer to the concepts and the operations they depict [42, 43]. For this reason in our tangible subsystem, cube weight has been adequately adjusted for example the parameter 4 has two times the weight of the parameter 2.

Availability

To increase system’s availability, the whole design was built upon common sensors and an easy to find, in many schools, Lego NXT robot. Finally, batteries are not required for the blocks and in general for the whole system, so continuous operation is assured without recharging.

Reliability

To increase system reliability, serial protocols were employed and consequently the number of commands that can be connected is not limited by the I/O ports of the microcontrollers used [3]. Furthermore, the cubes have embedded intelligence and are able to carry out self-check procedures concerning various issues such as quality of power supply and connectivity with neighboring blocks. If a problem is detected, each cube tries to deal with it on its own. In any case, the user gets an indication of proper operation on the blocks.

Affordability

The ability to depict a wide variety of concepts [34, 44]. To this end, by making some minor modifications on the computer software and changing the photos on the boxes, cognitive tasks can be implemented with the same graphical and tangible system [45]

3.2 PROTEAS kit

PROTEAS (PROgramming TangiblE Activity System) is an assembly including one graphical (V-ProRob) and two tangible robot programming tools (T_Butterfly) [46] and (T-ProRob). In the following, we present the tangible T-ProRob and the graphical V-ProRob subsystems which have been used in our study.

T_ProRob

The T_ProRob (tangible) subsystem consists of Plexiglas command and parameter cubes. By combining the cubes, users are able program an NXT Lego robot. An indicative program structure is shown in Fig. 1.

Fig. 1
figure 1

Father box and an indicative program with T_ProRob

In this program, the Lego’s NXT robot will do the following: (a) two steps backwards (b) make a sound (c) then a nested loop will executed, the robot for three times will make a delay and then with the inner loop will move in a square route (d) when the nested loop has been completed, the robot will carry out a check using the light sensor. If no light is detected, the robot’s lamp will turn on.

Users can perform robot moving control actions such as “move one step forward/backward,” “turn left/right” and also commands such as “turn off the light” and “make a sound.” Furthermore, repetition and condition programming structures are available, supporting more complicated combinations like nested repetitions and conditions. Finally, a special cube, where users can save their program code and reuse it later, as a function in other programs, completes the set of commands.

After the connection of the desired commands and parameters with the “father box,” user may initiate the program execution by pressing the run button located on the top of the “father box.” Then, the father box “reads” and forwards the program to a remote computer using an RS232 cable or Bluetooth. The computer records the program in a database and after compilation transmits the code for execution to the Lego NXT robot using Bluetooth. Τhe bidirectional communication between robot and cubes allows for increased user–system interaction. While the program is executed, the robot can, for example, report to the condition command cube that the result of a measurement was positive or negative. Then, the condition command informs the user by turning on the appropriate LED on the cube. Furthermore, users are informed, in synchronous mode, of a potentially “wrong” (non-acceptable) parameter connection through a LED indication on the parameter cube.

V-ProRobL

V_ProRob was designed based on T_ProRob’s functionality and offers a reliable graphical isomorphic equivalent. The subsystem presents onscreen the same features and operation as T_ProRob does.

In Fig. 2, an indicative program structure with V_ProRob is depicted. The functionality of the program is identical with the program previously presented with T_ProRob. With this graphical subsystem, users can create program sequences by arranging the available commands and parameters with a ‘drag and drop’ interaction technique. V_ProRob also supports bidirectional communication between robot and graphical environment. The communication is achieved using Bluetooth and allows the subsystem to provide feedback to the user over the icons of the commands and parameters, much like the tangible interface does.

Fig. 2
figure 2

Indicative program with the graphical interface

4 Method

4.1 Participants

The study was conducted in a public school at the area of Thessaloniki, Greece. One hundred and nine children of five age groups, 6–7 (N = 20), 7–8 (N = 25), 9–10 (N = 14), 10–11 (N = 25) and 11–12 (N = 25) years participated in the study.

All children volunteered to participate as part of their every day school activities, and they were randomly assigned to work in pairs. All participating children were not familiar with the systems and spoke Greek as their native language. All children had some familiarity with the mouse. Figure 3 shows children interacting with the tangible subsystem.

Fig. 3
figure 3

Children interacting with tangible subsystem

4.2 Setting and procedure

Experiments were conducted in classrooms and were appropriately arranged so that the two subsystems were equally accessible for all children. Three subject-matter experts, one in the front and two in the back along with video–audio recorders and computer logs, recorded the whole process.

Children, guided by the researcher, first filled out the questionnaires about their age, gender, familiarity with computers and computer programming knowledge. Then, the NXT Lego robot was presented, and following a simple scenario, the researcher presented to the children how they could program the robot with both systems. To rule out any possible sequence effect, we counterbalanced the presentation.

Then, two missions (Task1 and Task 2) were assigned to children, while a third mission (Task3) was assigned to the elder children. The first mission (Task1) was a simple sequential program up to six commands which involved cubes such as “move forward/backward,” “turn on/off the light” and “make a sound.” The second mission (Task2) was a more advanced sequential program with parameters; it was up to six commands and involved cubes such as “move forward/backward,” “turn right/left” “turn on/off the light” and “make a sound.” The third mission (Task3) was the most complex; it was up to five commands and in addition it involved the repetition (loop) structure.

Children were asked to accomplish the programming missions using one interface (selected so that half of the pairs started programming using the graphical interface and the other half using the tangible); subsequently, children had to accomplish missions of the same difficulty using the other interface.

After the interaction process with the tasks, the researcher let children free to make two more programs using each system successively. During this free session, children created programs with out any predetermined task scenario or time limit.

4.3 Measurements on tasks

The data collection procedure employed video–audio recordings and system databases logs. Video–audio recordings were analyzed by three experts (each one was responsible for one measured variable). By transcribing the video–audio recordings and database logs, we measure the three variables referring to children’s programming performance for the two systems, namely (1) time to accomplish tasks (TAT, the time needed to accomplish the programming tasks the soonest possible), (2) errors (ER, the number of erroneously executed programs) and (3) the debugging stages (DS, the debugging stages reached after errors).

For assessing the debugging stages the Carver and Klahr model [47] was used and a three-level ordinal variable was created. Using the available video–audio recordings and computer logs, we determine whether the children after a wrong execution (a) arrived at the correction of the bug (full debug), (b) noticed that a specific error occurred, but they were unable to locate and correct it (partially debug), (c) did not notice any discrepancy between the goal and the actual outcome and no debugging took place (no debug).

The above-dependent variables were examined as a function of gender and age.

4.4 Measurements on free interaction

By transcribing the video–audio recording and computer logs, we measure the four variables referring to children’s free programming performance for the two systems. These variables are (4) free interaction time-engagement (i.e., the time children spent to freely interact with the systems in order to create their programs), (5) program length, (6) program vocabulary and (7) program complexity.

For assessing the program length and vocabulary, we used the Halstead software metrics [48]. In our case, according to Halstead, the program length is the total number of commands and parameters, while vocabulary is the total number of unique commands and parameters within a program. Furthermore, in order to examine the program complexity, the logical structure of the program was analyzed by using the McCabe cyclomatic complexity measure [49]. All the above metrics are independent of the programming language itself [50]. To be fully unbiased, during our analysis, we set the maximum length of the programs for both interfaces to the maximum length of the program a user can see on a computer screen.

5 Results

5.1 Task performance analysis

5.1.1 Task1

The time to accomplish Task1 (TAT1) had a negative Pearson’s correlation with age for both interfaces, r = −0.63, (p < 0.001) and r = −0.68 (p < 0.001) for the tangible and the graphical, respectively. These relations are also depicted in Fig. 4, which shows the mean time children needed to accomplish Task1 with both interfaces as a function of the group age. A decrease in TAT1 is observed as the group age increases, and it reaches a plateau after 10–11 years age group.

Fig. 4
figure 4

The mean time children needed to accomplish Task1 as a function of group age for both interfaces

Focusing on the differences in TAT1 between the two interfaces, for the same age group, a t test and a nonparametric Wilcoxon Signed Rank Test were used. Both showed that in the case of the tangible interface, TAT1 measurements were significantly lower for age group 6–7 years (p = 0.049), age group 7–8 years (p = 0.030) and age group 9–10 years (p = 0.012), while no statistical differences were observed for age groups 10–11 years and 11–12 years.

5.1.2 Task2

The time to accomplish Task2 (TAT2) had a negative Pearson’s correlation with age for both interfaces, r = −0.42, (p < 0.001) and r = −0.44 (p < 0.001) for the tangible and the graphical, respectively.

These relations are also depicted in Fig. 5, which shows the mean time children needed to accomplish Task2 as a function of group age. A decrease in time is observed as the group age increases, and it reaches a plateau after 9–10 years age group.

Fig. 5
figure 5

The mean time children needed to accomplish Task2 as a function of group age for both interfaces

Focusing on the differences in TAT2 between the two interfaces, for the same age group, both a t test and a nonparametric Wilcoxon Signed Rank Test showed that no statistical differences exist for all age groups.

5.1.3 Task3

In Task 3, which was the most demanding and difficult, only age groups of 10–11 years (N = 7) and 11–12 years (N = 17) participated.

Figure 6 shows the difference between tangible and graphical cases for the two age groups. A t test with bootstrap (p = 0.001) and a nonparametric Wilcoxon Signed Rank test (p = 0.038) showed that the TAT3 in tangible interface was statistically lower than the graphical case for the 10–11 years age group, but not for the 11–12 years age group.

Fig. 6
figure 6

The mean time children needed to accomplish Task3 as a function of group age for both interfaces

5.1.4 Errors

Figure 7 shows the percentages of erroneous tasks in each interface. In all cases, more errors occurred with the graphical interface. Examining the differences, we found that only in task 2, statistical significance exists (χ2 = 3.96, p < 0.05).

Fig. 7
figure 7

The percentages of erroneous tasks in each interface

5.1.5 Debugging

Focusing on the error correction process, the debugging variable was analyzed after an unsuccessful execution in both interface settings. The two distributions are presented in Fig. 8, which shows a significant difference between them (χ 2 = 8,79, df = 2, p < 0.05). Thus, for the TUI case, it is more likely that the errors are fully corrected, while for the GUI case, the errors are more likely to be overlooked.

Fig. 8
figure 8

Distributions of errors for both interface settings

Concerning gender, non-significant effects were found for all dependent variables.

5.2 Free interaction analysis

5.2.1 Interaction time-engagement

If free interaction time-engagement is considered, no correlation with age exist. To focus on the differences between the two interfaces, a t test and a nonparametric Wilcoxon Signed Rank Test were used.

Both showed that in the case of the graphical interface, the free interaction time-engagement measurements were significantly lower for age group 10–11 years (p = 0.002) and 11–12 years (p = 0.000), while no statistical differences were observed for the other age groups. The effects are represented in Fig. 9 and show the interaction mean time, for free programming activities, as a function of the age group.

Fig. 9
figure 9

The mean free interaction time–engagement as a function of age

5.2.2 Program length

The program length had a positive Pearson’s correlation with age for both interfaces, r = 0.276, (p < 0.01) and r = 0.220 (p < 0.05) for the tangible and the graphical interface, respectively. Studying the program length for the two interfaces during free interaction, no significant difference exists. The program length in accordance with age is presented in Fig. 10.

Fig. 10
figure 10

The mean program length during free interaction

5.2.3 Program vocabulary

The program vocabulary had a positive Pearson’s correlation with age for both interfaces, r = 0.417, (p < 0.001) and r = 0.402 (p < 0.001) for the tangible and the graphical interface, respectively.

Focusing on the differences in program vocabulary between the two interfaces, for the same age group, a t test and a nonparametric Wilcoxon Signed Rank Test were implemented. Both showed that in the case of the tangible interface, the program vocabulary measurements were significantly higher for age groups 6–7 years (p = 0.041), 10–11 years (p = 0.036) and 11–12 years (p = 0.007), while no statistical differences were observed for age groups 7–8 years and 9–10 years. The program vocabulary variable in relation with age is depicted graphically in Fig. 11.

Fig. 11
figure 11

Program vocabulary as a function of age

5.2.4 Program complexity

Analyzing the internal complexity of the program created during the free interaction, a positive Pearson’s correlation with age for both interfaces, r = 0.673, (p < 0.001) and r = 0.595 (p < 0.001) for the tangible and the graphical subsystem, respectively, exists.

Focusing on the differences in program complexity between the two interfaces, for the same age group, a t test and a nonparametric Wilcoxon Signed Rank Test were implemented. Both showed that in the case of the tangible interface, the complexity measurements were significantly higher for age groups 10–11 years (p = 0.017) and 11–12 years (p = 0.001), while no statistical differences were observed for the other age groups. Figure 12 represents the complexity of the free program as a function of age.

Fig. 12
figure 12

The percentages of erroneous tasks in each interface

6 Discussion

In this paper, we presented a series of design guidelines for tangible programming tools to inform design, and using PROTEAS kit, we carried out a comparative study between two isomorphic interfaces. The first assigned group of tasks (Task1) was chosen on purpose to be easily accomplished by the children. The second group of tasks (Task2) was more challenging, and for this reason, a number of programming errors occurred. The third group of tasks, the most difficult one (Task3), was assigned only to some elder children, and thus, we managed to further focus our study on these particular age groups. Finally, children were let freely to interact with both systems in order to create one to three programs with each condition. Accumulated programs of this free interaction session were analyzed using simple software quality metrics.

6.1 Time to accomplish the tasks

Regarding the time to accomplish the tasks, TAT decreases with the age in all tasks for both the tangible and graphical interface. This effect is anticipated considering the developed skills and intelligence of elder children. Focusing on the comparison between the two interfaces, TAT1 for the tangible subsystem was significantly lower especially for younger children. This might be interpreted by considering the differences in children’s familiarity with similar tangible games such as LEGO bricks and puzzles, and computer use. In elder children case, the experience with computers is increased and thus the TAT1 differences between the two interfaces are expected to be diminished. Finally, with no statistical significance, the TAT1 for graphical interface becomes even lower at the age of 11–12. This trend is evident in all tasks and might show that this particular age is the threshold where children reach the crucial computer skills that make them to further reduce their TAT with the GUI. To further investigate this effect, we implemented Task3 engaging the two elder age groups. Although the differences, as we expected, became clearer, significance was reached only for the 10–11 years age group and not for the 11–12 years group.

In general, the present findings based on time to complete a task are consistent with the results reported in [22] and simultaneously extent the results by Xie et al. [8] who reported based on children’s (aged 7–9 years old) self-report that it was easier for them to interact with physical pieces than with the mouse. Summarizing, one may say that the familiarity with tangibles, and on the other hand, the accumulated experiences with computers are most probably the factors explaining our findings [22].

6.2 Errors

Regarding programming errors, children made more mistakes with the graphical interface in all tasks. However, in Task2, which was of moderate difficulty, the number of errors occurred in tangible interface was significantly lower. This significant difference was not observed in the other two cases. In the easiest Task1, a negligible number of errors occurred, while in the most difficult Task3, the statistically insignificant difference might be due the small number of children engaged.

The decreased number of errors while programming the robot with the tangible subsystem might be due to active participation and motivation. It is reported that children more actively participate in programming activities with tangibles than with graphical interfaces [7]. Several studies have shown that working together on specific collaborative tasks may increase children’ enjoyment, engagement and motivation (e.g., [51, 52] ). In the case of tangibles, since both participants can be simultaneously active, motivation is increased, contrary to the graphical case, where the one child is more likely to be a passive and maybe less motivated spectator [53].

6.3 Debugging

Children’s interaction with tangibles appeared to facilitate debugging after an error occurrence. Using the tangible interface makes it more likely for a full debugging process and correction of the error to take place, whereas with the graphical one, it is more likely to be neglected. This finding might be attributed again to the active involvement of both group members in the case of tangibles, while in the case of graphical interface, one member uses the mouse and programs, as active participant, whereas the other is more likely to be a passive observer. Working with tangibles both users are equally participating [54, 55], so maybe they feel equally responsible to inspect the executed program. Furthermore, with tangibles, members can increase their visibility of the working plane [13, 56], and consequently both members can more easily detect the possible inconsistencies.

6.4 Free interaction time-engagement

Free interaction time, with a system, appears to be well correlated and so is usually interpreted as an indication of engagement [8, 26]. This measure in our case does not appear to be correlated with age, for both interfaces. Interestingly, focusing on the differences between the two interfaces, the elder groups (aged 10–11, 11–12 years) showed higher engagement with the tangible subsystem. Our results may support the results of our previous research [22] which showed, based on qualitative and quantitative analysis, that especially elder children were highly motivated and enjoyed tangible programming the most. The result of this measurement is raising questions related to the nature of the programming code that the children produced during their free interaction time, especially in the tangible case when children spent more time.

6.5 Program length

In the literature, one measurement of the programming quality appears to be the program length. This particular measurement in our case shows that, while growing, children create bigger programs. This is expected due to the higher mental skills developed by the elder children. If the program length is examined in relation to the tangible and graphical subsystems, no differences between the two cases appear to exist. This effect is fully in accordance with the results reported by Horn [26] who measured and compared the number of commands in programs created with a passive tangible and a graphical programming language in informal learning environment.

6.6 Program vocabulary

In terms of the different commands and parameters used within a program, the results interestingly denote that children in general preferred to explore the different capabilities available mostly in the tangible case. This particular trend is statistically significant for the very young and elder children (6–7, 10–11, 11–12 years of age).

The results for this dependant variable may partially give an answer to the question “what children did during their free interaction time?” Elder children (aged 10–11, 11–12 years) spent more time in the tangible case (Fig. 9), and it seems that they explored more alternative commands and parameters. On the contrary, younger children spent almost the same time in both cases, but the outcome appears to be different for the two cases. A possible explanation for the younger children can be found with a closer look at the TAT1 variable (time to accomplish task1). Younger children are able to manipulate tangibles in less time in order to produce the same program with the graphical. Consequently, spending the same time for both the tangible and the graphical interfaces means that they have more time to explore the possible alternatives with the tangible subsystem. This particular result is in accordance with the results reported by Schneider et al. [29] who reported that pairs using tangible interface were more explorative on alternative logic designs.

6.7 Program complexity

By measuring the number of linearly independent paths of the program source code, we measure the McCabe program complexity metric. This particular software metric is sensitive on the repetition and conditional programming structures. Examining the results depicted on Fig. 12, we may assume that younger children did not used enough repetition and conditional structures in their programs with both tangible and graphical subsystem. Maybe children of this age preferred to use simple structures or maybe they were not ready to use appropriately repetition and conditional structures. On the contrary, elder children (aged 10–11, 11–12 years) were able to use these complicated structures with both interfaces, but the programming outcome was significantly more complicated in the tangible case. This particular outcome may provide an additional answer to the question of what kind of programming code the children produced during this free interaction time. Elder Children not only spent more time in the tangible case (Fig. 9), but were more explorative, as explained before, and this exploration together with the extra time spent with the tangible subsystem lead them to produce more complicated programs. Our results are coherent with the results reported by Schneider et al. [29] who reported that tangibility trends to increase exploration and consequently enhance performance in simple logic tasks.

7 Conclusion and future research

In this paper, we carried out and presented a comparison study of children’s performance using the two isomorphic subsystems (TUI vs. GUI).

Data analysis upon task measurements showed that younger children needed less time to accomplish the programming tasks when using the tangible interface. On the contrary, elder children, who were more experienced computer users, needed almost the same time to accomplish the tasks with both interfaces. Furthermore, fewer programming errors occurred and better debugging was achieved in the tangible case.

Moreover, data analysis on free interaction showed that elder children were more engaged, explored more alternative commands and parameters and created more complicated programs with the tangible subsystem. On the contrary, younger children spent almost the same time in both cases, but it seems that they were more explorative with the tangibles.

The above findings could be explained by considering both children’s familiarity with games that resemble tangibles and prior computer experience. Moreover, we have proposed that additional factors, such as the ability to better see, collaborate and actively participate have also implicitly affected our research results.

Regarding our initial design guidelines to have both TUI and GUI interfaces, our findings also support that a balanced transaction between the two alternative interfaces in relation to user experience and weather children interact upon tasks or freely may be beneficial for certain users. Our initial goal regarding portability and availability was proved also crucial to implement our experimentation, which took place in many classrooms and surface for a long continuous time. Additionally, our finding regarding complexity program length and program vocabulary highlighted the need to have a plethora and adequate collection of commands and parameters. In any case, more effort and focused research is needed to evaluate how the findings impact design.

In conclusion, the results supporting TUIs in introductory programming encourage further investigations on learning and performance and possible training applications, such as with adult novices or individuals with special needs. In addition, advanced modern programming techniques, such as pair programming or object-oriented programming, may be combined with tangibility and researched to provide additional insight to early programming.