Introduction

Collaboration is exclusively a human activity because of our capacity to create and regulate our actions by group-based norms. In contrast, other animals, even “our closest living primate relatives lack normative attitudes and therefore live in a non-normative socio-causal world structured by individual preferences, power relationships, and regularities” [1]. Humans are “normative animals”. Even more, extensive literature shows how norms have played a key role in the evolution and maintenance of human cooperation, collaboration, social institutions, and culture [1,2,3]. Human normativity is the foundation of human’s evolutionary game-changer: large-scale collaboration among unrelated strangers. This capacity for normativity lies at the core of “uniquely human forms of understanding and regulating socio-cultural group life” [1].

Social norms are ubiquitous and pervasive in human interactions. Despite that, they remain to be one of the main unsolved problems in social cognitive science. According to Fehr and Fischbacher,

although no other concept is invoked more frequently in the social sciences, we still know little about how social norms are formed, the forces determining their content, and the cognitive and emotional requirements that enable a species to establish and enforce social norms [4].

Social norms are “standards of behavior that are based on widely shared beliefs how individual group members ought to behave in a given situation” [4]. Norms tell us “what to do” and “how to do” in different situations—regulate our actions when engaging with others in a team. Normative negotiations are often observed in team interactions in the form of questions such as “What are we doing?” and “How are we doing what we are doing?”. Those questions seek an agreement on a constitutive norm (what is that we are doing) and regulative norms (how we do what we are doing). This study seeks to expand our understanding of team dynamics and collaboration by understanding its norms and the social pragmatic cues that afford collaboration.

However, the study of norms faces three challenges: (1) they are largely invisible, as they are implicit in the interaction of the participants, (2) norm-based environments are complex dynamic systems, they are not “simply being imposed on agents a priori” [5], and (3) norms are situated, embedded in a specific setting of a practice. Normativity is part of the “ongoing negotiation of identity and cultural meaning” of a community of practice [6].

In the first part of this paper, we lay out our theoretical approach that brings together (a) complex systems theory, (b) an enactive-ecological approach to cognition, and (c) social learning systems as a framework to studying social norms in the context of creative collaboration.

In the second part, we look at a preliminary study, which serves as a proof of concept to a quantitative approach to studying social norms by using body signals and facial expressions.

Theoretical Background

Norms as Media for Coupling of Systems: A Complex Systems Theory Approach

This study understands creativity as a process that emerges through collaboration. The creative process has its own existence in the form of a self-organizing and emergent system. The background theory is systems theory from biology, sociology, and creativity [7,8,9].

Creative collaboration: Norms as media for coupling of systems

Teams are self-organizing social systems that emerge from the effective coupling of psychic systems (team members). As a system of systems, teams have a precarious existence. This existence depends on the strength of the structural coupling of systems that is uncertain; it may happen or not. The coupling occurs through perception–action between the team members and its environment. To provide certainty to this coupling, media—an evolutionary artifact—needs to come in place to guide the perception and action of team members [7]. To our account, norms are media that facilitates the structural coupling between team members that affords collaboration to emerge. In other words, norms are cultural outcomes that increase our chances to survive collaboratively.

In social systems theory, language is the most common media that facilitates the coupling. When it comes to collaboration language alone is not enough. As shown above, social norms are the media that makes the structural coupling between team members possible. However, in everyday social interactions agents have to infer whether an act is normative from subtler, social pragmatic cues [1]. Doing that requires capacities for intersubjectivity or collective intentionality and shared values in a group. That is the “ability to share attention and mental states (e.g., intentions, goals) with conspecifics and thus to engage in shared intentional activities” [1]. Norms facilitate the emergence of a team’s shared perception.

Norms in the Environment: An Enactive-Ecological Approach

Our perception and our actions are guided by norms present in our environment [10]. Norms are perceived by the agents through affordances [11]. Affordances, in a broad sense, offer the agent possibilities for action. In the case of norms, not only what actions but also how to act. Those affordances emerge from the interaction between the agent and the socio-material environment. It means the social interaction and cultural setting as well as the physical context in which collaboration emerges [12].

An enactive-ecological account of norms implies that norms are dynamic, emerging in the socio-material landscape. In this sense, “norms must also be understood as an embodied and situated practical sensitivity to the unfolding dynamics of the here-and-now contextual particularities of practices” [13]. Therefore, affordances are possibilities for skilled action depending on the competencies that the group of agents has. The different norms that emerge are defined by the practice and experiences of the team members and the setting in which they are situated—the rich socio-material landscape of affordances [14].

Norms in a Community of Practice

Norms belong to a particular practice, a form of life, a setting [10, 12, 15]. And here we understand teams, as social systems, to be a social learning system in the form of communities of practice [16]. According to Wenger, the engagement in a community of practice involves a dual meaning-making process through participation and reification. Participation is materialized through direct engagement in activities, conversations, and reflections; reification in the production of physical and conceptual artifacts—words, tools, concepts, stories that organize our participation [6].

Normativity is a form of reification; norms are conceptual artifacts that coordinate and anchor our perception and participation. They provide a common meaning to the shared experience as a team and community.

Social Norms as Solicitations to Collaborating

The authors take an enactive-ecological approach, in which social norms are dynamic and context-dependent socio-material affordances for collaborative activity. Social norms offer the agent possibilities for collaborative action with others in the form of pragmatic social cues. These social cues are context-dependent, and they belong to every practice.

Using Physiological Data and Facial Expressions Synchrony to Study Norms in Creative Collaboration: Individual and Team Level

The understanding of teams as complex systems implies a multidirectional causality of its dynamics. As in situated cognition, “one of the fundamental concepts is that cognitive processes are causally both social and neural. A person is obviously part of society, but causal effects in learning processes may be understood as bidirectional” [17]. The same applies to teams and individuals. The team behavior and its normative status affect the physiological responses of the team member. Likewise, the team member’s physiological processes affect team behavior and team norms. The descriptive and normative accounts of reality are dependent phenomena and are causally related. That action–perception–reaction is part of the normative negotiation of that community.

In the paper “Socially Extended Cognition and Shared Intentionality”, Lyre [18] offers a more detailed account of aspects of the environment that can provide the social cues for the coupling. He claimed that “virtually all mechanisms studied in social cognition […] can be seen as potential coupling mechanisms of social extension” [18]. Building on his suggested list, we have included aspects from the enactive-ecological approach to the list, as well as a categorization of those mechanisms based on the normative (Table 2.1) and descriptive (Table 2.2) distinction. The normative mechanism for collaboration cannot be accessed directly. They need to be inferred from the descriptive pragmatic social cues. The descriptive mechanism for collaboration can be accessed directly. They are available in the socio-material environment in the form of pragmatic social cues that together constitute affordances to collaborate.

Table 2.1 The normative mechanism for collaboration
Table 2.2 Descriptive mechanisms for collaboration—pragmatic social cues

The social pragmatic cues of a specific practice can be inferred by integrating multiple descriptive, factual aspects of the socio-material environment. Specifically, intersubjective descriptive mechanisms need to be integrated into the situated descriptive to be able to infer the normative aspect of the social situation. The social norms that guide our perception and action are key to preserve the harmony—or chaos—of the community in which that collective experience is situated.

To capture this bidirectional causality—the individual and the team—this study leverages advanced approaches using digital technology for data collection and analysis. In this research in progress, we collected and analyzed physiological data using wearable devices, facial expressions using video recordings, and perceived team cohesion via self-assessment. In the study, participants engaged in a creative collaboration task provide a proof of concept for using body signals to understand the norms that drive creative collaboration.

To clearly frame the scope of this preliminary study, two research questions are addressed:

  1. 1.

    What are the right theoretical frameworks for the study of group-based norms in creative collaboration?

  2. 2.

    How can new computer-powered automatic data collection and analysis methods contribute to quantitatively study of social norms in creative collaboration?

Study Design

Participants

For this preliminary experiment 10 participants—6 females and 4 males—were recruited from a group of graduate students from a digital engineering institute in Germany. Regarding their profession, all of them are researchers in the field of computer science with a highly homogenous cultural background—northern European—and none of them was a native English speaker. The age of the participants was between 21 and 28 years and they had no previous experience working together. The participation was voluntary—not subject to any payment. The participants were paired based on their time availability, which resulted in four gender-diverse dyads and one dyad of females. All participants signed the corresponding informed consent form. All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable with ethical standards.

Fig. 2.1
figure 1

Experimental setup

Procedure

The experiment was to collaboratively work with a wooden puzzle. The experiment was divided into three consecutive tasks, each one of them with the same 3-D wooden puzzle of a Dinosaur but with a different set of instructions (Fig. 2.1). The total length of the experiment was 45 min, including 5 min of baseline before the Tasks 1, 2, and 3. Every task lasted for 5 min and was followed by a 3 minutes break to fill a survey on perceived team cohesion (PTC) (see Table 2.3).

Table 2.3 Experiment procedure

The following task instructions were given to the participants: For Task 1, the participants were given the puzzle and its cover (see Fig. 2.2), without any other instruction. For Task 2, they were given two different written instructions to assemble the puzzle through creative collaboration. For Task 3, they were given no instructions but an assembly guide of the puzzle (see Fig. 2.2).

Fig. 2.2
figure 2

Dinosaur Set 1, cover sheet (left side) that served as visual instructions for task 1 and one of the pages of the assembly guide (right side) provided to the participants at the beginning of task 3

Data Collection

The data collected during the experiment consisted of perceived team cohesion, electrodermal activity (EDA), and heart rate (HR) and video, from which we can extract facial expressions of the participants. For the collection of EDA and HR data from the participants, we used the Empatica E4 wristband [19]. The data collection was done using a stationary setup; no audio-visual staff was present during the recording.

Perceived Team Cohesion

To measure the perception of the participants for each task, we use a self-report questionnaire. The “Perceived Team Cohesion Questionnaire” (PTCQ) has 10 questions and was answered individually by every participant after every task using a Likert scale (Table 2.4). The questionnaire was adapted from the paper “Physiological evidence of interpersonal dynamics in a cooperative production task” [20].

The changes in the perception of collaborative work collected with PTCQ serve two purposes. First, it was used as a validation measurement that the experiment design and intervention did actually generated a change—especially in Task 2—and that the change was perceived by the participants. The second purpose was to study correlations between PTC and synchrony of physiological signals and facial expressions.

Table 2.4 Perceived team cohesion questionnaire (PTCQ)

Physiological Data

Electrodermal activity (EDA)

According to Boucsein [21], EDA refers to the electrical potential on the surface of the skin which is controlled by the sympathetic nervous system increases in sudomotor innervation, causing EDA to increase and perspiration to occur. Quick changes in EDA—arousals—are a response to stress, temperature, or exertion and have been frequently used in studies related to affective phenomena and stress [22]. In this preliminary study, synchrony between participants is calculated based on the similarity of arousal peaks, technically known as skin conductance response (SCR). The shape of an SCR—arousal—should typically last between 1 and 5 s, has a steep onset and an exponential decay, and reaches an amplitude of at least 0.01 µs [21].

For the collection of EDA data from the participants, we used the Empatica E4 wristband [19], which collects EDA at a frequency of 4 Hz by using two electrodes on the skin.

Before data analysis of SCR, EDA needs to be processed. For EDA, the raw data consists of phasic and tonic EDA. To study the synchronization between SCR—phasic—of two signals we need to extract the tonic. To extract it, the raw data for each participant was visualized using Ledalab and a continuous decomposition analysis was run [23]. Because of individual dependency of EDA and to avoid noise, we smoothed the data and normalized it using a z-score normalization on the signal (see Fig. 2.3).

Fig. 2.3
figure 3

Data processing of raw electrodermal activity data (EDA) included tonic extraction using continuous decomposition analysis (CDA) and data normalization. After that, the data corresponding to Tasks 1, 2, and 3 were manually extracted

In the phasic EDA sheet, we got two columns; one is the timestamp and the other is the amplitude of the signal. We cut the data manually in the interval of the desired time in seconds by the column of the timestamp. The data of two participants for each task were plotted in one graph to analyze the synchronization between both of them, as described in Section “Data Analysis”.

Heart Rate (HR)

The second physiological measure is HR, which captures the difference between interbeat intervals (IBI) and is important in estimating vagal tone and parasympathetic nervous system activity.

To collect HR, we used the Empatica E4 wristband that uses a photoplethysmography sensor to illuminate the skin and measures the light reflected by the presence of oxyhemoglobin. According to Garbarino and colleagues [19] with each cardiac cycle, the heart pumps blood to the periphery, changing the volume and pressure produced by the heartbeat which correlated to a change in the concentration of oxyhemoglobin.

Before the analysis of synchrony between participants, every HR raw data was normalized, dividing it by the mean, in order to reflect changes in HR relative to a baseline—the mean. We cut the data manually in the interval of the desired time in seconds corresponding to every task (Fig. 2.4). The data of two participants—for every dyad and for each task—were plotted in one graph to analyze the synchronization between both of them as described in Section “Data Analysis”.

Fig. 2.4
figure 4

Data processing of raw heart rate (HR) data normalization by mean extraction. After that, the data corresponding to Tasks 1, 2, and 3 were manually extracted

Facial Expressions

Our faces offer a rich source of pragmatic social cues. From facial expressions we communicate and infer emotions and intentions; they serve as a visual guide on how to act during social interactions and encounters with others [24]. Previous research on facial mimicry considers it a “basic facet of social interaction, theorized to influence emotional contagion, rapport, and perception and interpretation of others’ emotional facial expressions” [25].

In this study, we use computer vision to analyze facial action units (FAU), a coding system based on the muscle of the face that is correlated with certain emotional states.

Video footage of the experiment was captured using a 360° video camera. A free capture of the face of each participant was extracted from the 360° video, for every task. Every video was analyzed using the open-source application Openface 2.0 [26] for automatic facial behavior analysis. Based on the software confidence output, noisy data was removed—less than 5 percent of total frames. The frequency of analysis is 30 frames per second, which provides a very rich and high-granularity data. The FAU analysis provides two values, presence and intensity for every FAU.

Before the synchrony analysis, the raw is coded into positive, negative, and neutral facial expression based on the values of FAU. Positive expression was coded if AU_12 is there, then it will indicate positive expression and the amplitude in those points will be the average of the intensity of AU_06 and AU_12. A negative expression was coded if AU_15 and AU_01 are present and the amplitude in those points will be the average of the intensity of AU_15, AU_01, and AU_04. Since the second coding scheme is negative emotions, to make the value negative in the plot, we multiplied the set by −1. Positive, negative, and neutral data points for two participants for each task were plotted in one graph to analyze synchronization between both of them, as explained in Section “Data Analysis” (Fig. 2.5).

Fig. 2.5
figure 5

Facial expressions were extracted using automatic facial behavior analysis using computer vision and coded into positive and negative emotions based on FAU presence and intensity. The intensity values were plotted together to analyze synchrony

Data Analysis

Synchrony of Physiological Signals

Physiological interpersonal synchronization for every dyad was calculated using dynamic time warping (DTW) [27]. In particular, we analyzed the synchrony of EDA, HR signals and facial expressions—positive and negative separately—between two partners in a dyad. DTW is an algorithm for measuring the similarity between two temporal sequences that vary in time and speed. DTW provides the distance between the partners’ physiological response signals for each task. Interpersonal influence in social interactions occurs normally within a five seconds timeframe [27]. For that reason, we enforced a locality constraint of five seconds while searching for the nearest points between the signals. As the frequency of the EDA signals is 4 Hz in our case, the constraint window consists of 20 samples within which we searched for the similarity. For HR, the five seconds windows corresponded to five data points. For the facial expressions, the window size corresponded to 150 data points—30 samples per second (Figs. 2.6 and 2.7).

Fig. 2.6
figure 6

EDA plot of signals from participants 1 and 2 corresponding to dyad 2 in task 3

Fig. 2.7
figure 7

DTW measures the distance between the two signals to provide a physiological synchronization coefficient based on EDA corresponding to dyad 2 in task 3

Once we had the synchrony coefficient of every dyad for Tasks 1, 2, and 3, we looked for the correlation between the different signals and the perceived team cohesion (PTC) data from the survey using Pearson’s correlation coefficient (PCC) [28]. The PCC range from 1 to −1 (positive to negative/inverse correlation).

The synchrony coefficient was normalized to z-scores and multiplied by −1, to ensure comparability with PTC.

Perceived Team Cohesion Analysis

The data from the PTCQ were averaged between dyads and then normalized across tasks to z-scores.

Results

Because of the preliminary nature of this study, its main goal is to explore and test new methods to study social norms in collaboration. In the following paragraphs, we present some initial findings, making clear to the reader that the results are not meant to be conclusive but rather illustrative of the expected results (Fig. 2.8).

Fig. 2.8
figure 8

Synchrony coefficients z-scores per index plus PTC

A general overview of the data shows consistency in the changes of interpersonal synchrony levels due to the experiment design and intervention. Of special interest to the authors was observing the changes experienced between Tasks 1 and 2 and between Tasks 2 and 3. Tasks 1 and 3 are designed to have a high level of normative agreement with a shared goal. In contrast, the intervention in Task 2 was meant to misalign the participant’s goal by providing different instructions before the task. This served as a first validation of the intervention design and the methods employed.

Physiological synchrony and correlation between EDA sync and HR is positive and show a direction that could be further explored with a larger data set. EDA sync correlations values are not strong enough to be considered. Interestingly, EDA follows the tendency of negative correlation with negative facial expression synchrony. For HR sync, a positive correlation with PTC (r. 0.344) can be seen, which are also in agreement with the intervention design.

Facial expressions synchrony correlation, between positive and negative facial expression sync, are strongly negative. This correlation can be explained because of the experiment design of Tasks 1 and 3, agreement and shared goals versus Task 2, disagreement and misalignment, associated with more negative facial expression. In the same direction, the correlation between facial expressions and PTC is positive for positive facial expression and negative for negative expressions. The consistency between facial expression and PTC places facial expressions as a potential reliable indicator of normative agreement (Table 2.5).

Table 2.5 Descriptive statistics and correlations

Discussion

As a preliminary study investigating the quantitative correlations between different levels of normative agreement, levels of perceived team performance, physiological synchronization, and facial expression synchrony the experiment provides feasibility proof for further research with a larger group of participants and more means of measure synchronization to understand social norms. Quantifying social norms opens up new opportunities for research. Norms are at the core of human collaboration, therefore, advancing our understanding of social norms is the key to any other form of collaboration.

The enactive-ecological as a theoretical framework provides a flexible yet rigorous and solid foundation from cognitive sciences and philosophy. A strong advantage of understanding group-based norms as socio-material affordances available in the form of pragmatic social cues broadens the scope of study of normativity in collaboration. It moves away from a “head-locked” approach to social norms and brings embodied and environmental aspects that can contribute to better understand the mechanism underlying our unique ability for normativity and collaboration. This approach opens up the scope of the socio-material environmental aspects that have not been traditionally considered in social studies.

In the same direction, new computer-powered data collection methods such as wearable devices have become unobtrusive and accessible. This technological advancement in sensing devices enables larger studies with cost-effective data collection. Regarding data analysis, open-source software and machine learning can provide fast and efficient methods for the analysis of large data sets. The preliminary experiment presented in this study would have taken great amounts of manpower in terms of time and knowledge for manual coding and labeling of data. Thanks to reliable and rigorous methods or methodologies, studies that integrate multiple data sources can contribute to mapping the bigger picture of group-based norms in collaboration.

Future Research and Limitations

Social norms are invisible and dynamics but people can navigate them quite successfully, thanks to pragmatic social cues found in the environment. This preliminary focused on the use of computer-powered automatic data collection and analysis methods as a means for the understanding of social norms. The study of social norms will remain incomplete if the research cannot integrate multimodalities of the pragmatic social cues. We believe quantitative methods can contribute to bridging that gap in a time and cost-effective manner. The limitations of such an extensive approach are the sacrifices of the phenomena of fidelity when compared to qualitative methods. Future research will focus on other social cognitive vehicles such as body posture, gestures, eye movement, head pose and communicative actions to gain a better understanding of social norms in a similar fashion people do. Adding these different pieces together can give us a map of the coupling structures of social norms.