Introduction

The direction of the entire body or body parts of partners plays significant roles in the social life of animals, including humans (Homo sapiens), as a cue about partners' attention (Emery 2000; Kobayashi and Kohshima 2001; Laidlaw et al. 2016; Langton et al. 2000). In particular, the important role played by the orientation of the head, face, and eyes in several species, especially primates, has attracted researchers' interest (see Emery 2000; Kleinke 1986 for review). At least in humans, and possibly in hominid species, face-to-face configurations and eye-to-eye contact are thought to provide a platform for building qualitatively distinct social relationships between interactants (humans: Stawarska 2010; Trevarthen and Aitken 2001; chimpanzees, Pan troglodytes: Tomonaga et al. 2004; western lowland gorillas, Gorilla gorilla gorilla: Gómez 2010). That is, face-to-face configurations through eye contact can give rise to intersubjectivity (Gómez 1994; Stawarska 2010). Partners are regarded as being intersubjectively engaged in an interaction in the following situations: at a minimum, A is attending to B who is attending to A and is ready for a coordinated interaction, and vice versa; or, in more detail, A is aware both that B is attending to A and that B is aware that A is attending to B, and vice versa (Bard and Leavens 2009; Gómez 1994, 1996; Stawarska 2010; Susswein and Racine 2008). Some have argued that intersubjectivity in this sense precedes joint attention ontogenetically and is a necessary precondition for it (Bard and Leavens 2009; Brinck 2008; Trevarthen and Aitken 2001). Although it is widely debated whether joint attention exists only in humans or also in some great apes and other animals (e.g., Bard and Leavens 2009; Tomasello 2008), the phylogenetic distribution of intersubjectivity in the above sense is unclear. It is important to address this issue to understand the evolutionary elaboration of communication in the animal kingdom and the evolutionary background of human communication. Using data from free-ranging Japanese macaques (Macaca fuscata), this study examines whether face-to-face configurations serve as a basis for intersubjective engagement in the above sense in a non-hominid primate.

In both hominids and non-hominids, face-to-face or eye-to-eye contact is made between partners in a variety of contexts. For example, face-to-face interaction has been well documented in the context of mother–infant communication (humans: Fogel et al. 1999; chimpanzees: Bard et al. 2005; rhesus macaques, Macaca mulatta: Dettmer et al. 2016; Ferrari et al. 2009; capuchin monkeys, Sapajus libidinosus: Verderane et al. 2020). Eye contact also functions to reduce tension and invite non-aggressive interactions (mountain gorillas, Gorilla beringei beringei: Yamagiwa 1992), and sets the stage for reconciliation (chimpanzees: de Waal and Roosmalen 1979). Eye contact can also serve as a sexual attractant and reinforce sexual bonds. For example, in bonobos (Pan paniscus), copulation and genito-genital (GG) rubbing are accompanied by eye contact (Kano 1980; Kitamura 1989; Savage and Bakeman 1978). Eye contact in stump-tailed macaques (Macaca arctoides) increases sexual arousal in males (Linnankoski et al. 1993), and females induce copulation by making eye contact with males (Chevalier-Skolnikoff 1974).

The use and function of the face-to-face interaction through eye contact is not consistent among primates, and inter-species differences have been identified (e.g., Harrod et al. 2020; Thomsen 1974). It has been argued that these inter-species differences are, to an extent, related to species-specific social structures. Especially, the use of eye contact as a threat signal in species with despotic societies has been contrasted with its use as an affiliation signal in species with egalitarian societies (e.g., Harrod et al. 2020). For example, eye contact in Western, educated, industrialized, rich, and democratic (WEIRD) human populations (Henrich et al. 2010) is thought to signify an affiliative stance toward partners (Argyle and Dean 1965; Exline et al. 1965; Exline and Winters 1965). Bonobos, which are more tolerant and less despotic than chimpanzees, make more eye contact than chimpanzees (Kano et al. 2015). Despite these cases of the affiliative use of gaze signals in non-despotic populations and species, the predominant response to gaze in non-human primates is expressions of subordination or gaze aversion (Lorenz 1966; Perrett and Mistlin 1990; van Hooff 1967). Researchers have assumed that the less frequent use of face-to-face and eye-to-eye contact in despotic species is probably because they convey competitive and agonistic messages in these species (e.g., Emery 2000; Redican 1975). For example, in Japanese macaques, staring with an open mouth serves as a threat to conspecifics (Huffman 1987; Chapais 1988). Rhesus macaques avoid mutual gaze even before reconciliation (de Waal and Yoshihara 1983). In gray mouse lemurs (Microcebus murinus), prolonged eye contact occurs when they are initiating an aggressive encounter (Coss 1978).

There are not only inter-species differences in the use and meaning of face-to-face and eye-to-eye contact, but also within-species differences. That is, the face-to-face and eye-to-eye contact does not have a single, species-specific communicative message that is fixed somewhere on the agonistic–affiliative axis; rather, their use and meaning change even within species depending on the social context. Humans make more eye contact when they are in a cooperative relationship than in a competitive one (Exline 1963). For example, eye contact is less likely to occur when a person is in the presence of someone who has recently deceived them (Exline et al. 1970). Yamagiwa (1992) examined the use of eye contact in a group of gorillas and showed that social staring can function as an invitation to non-aggressive interactions or as an appeasement to reduce social tensions, depending on the social context.

In an actual social situation, face-to-face and eye-to-eye contact may have the following two functions. First, it creates a situation in which partners are actively and intersubjectively engaged with each other (Gómez 1994, 2010; Stawarska 2010) and indicates that a communication channel is open between them (Argyle and Dean 1965). Second, it conveys a functional message that can be specified somewhere on the agonistic–affiliative axis based on social relations and contexts. Note, however, that these two functions may exist independently. Studies of the face-to-face configuration and eye contact in humans have examined not only their function as affiliative or agonistic signals (e.g., Argyle and Dean 1965; Exline et al. 1965; Exline and Winters 1965), but also their role in intersubjective engagement. Indeed, before starting a conversation, humans in WEIRD populations mutually direct their faces and eyes toward each other (e.g., Mondada 2009). Moreover, eye contact or a face-to-face configuration in humans clarifies who is available as a participant in the interaction (Kendon 1990), reflects a willingness to initiate and maintain interaction (Exline et al. 1965), and serves as a signal indicating that the interactants are mutually attentive and engaged (Cary 1978; Laidlaw et al. 2016). Eye contact in humans also acts as a signal to facilitate ongoing interactions (Kleinke et al. 1975) and coordinates turn-taking in conversation by soliciting the partner's response (Bavelas et al. 2002; Rossano 2013). In mother–infant interactions, it serves as a cue to assess interactants’ intent and enhance their responsiveness (Stern 1985). By contrast, studies in non-human animals, except a few studies on great apes (Bard et al. 2005; Bard and Leavens 2009; Gómez 1994, 2010), have been conducted predominately to examine their function as affiliative or agonistic signals.

By comparing interactions initiated with or without the face-to-face configuration or eye contact, we can clarify whether the face-to-face configuration and eye contact functions to establish mutual engagement for subsequent interactions. As described in previous research on humans, interactions involving face-to-face configuration are characterized by symmetric engagement in which both interactants are mutually involved in social exchange and actively affect each other (Hsu and Fogel 2001; Sansavini et al. 2015). This study compared dyadic play-fighting sessions of Japanese macaques preceded and not preceded by a face-to-face configuration, to determine whether a face-to-face configuration in a non-hominid primate serves to establish mutual engagement for subsequent interactions.

Play fighting is a social interaction commonly observed in immature mammals (Eaton et al. 1986; Reinhart et al. 2010). Generally, play fighting is a competitive but non-agonistic social interaction; players use behavioral patterns that are also used in serious aggression, such as biting, grabbing, and wrestling (Bauer and Smuts 2007; Burghardt 2005; Palagi et al. 2016) and compete for an advantage over their playmates (Aldis 1975; Himmler et al. 2013; Reinhart et al. 2010). These pseudo-aggressive behavior patterns are applied gently so as not to injure the playmates, but they can be misleading and sometimes escalate to a serious fight (Palagi et al. 2016). In the widely accepted definition (Burghardt 2005), play is spontaneous, endogenously motivated, and performed for its own sake (Held and Špinka 2011). Therefore, social play, such as play fighting, cannot occur unless both individuals are readily engaged and voluntarily participate in the interaction. If one of the players becomes reluctant, the play is terminated immediately. Players exchange a variety of signals during play, to maintain it and avoid misunderstanding (Heesen et al. 2017; Palagi et al. 2016). For example, rapid mimicry of play signals has been shown to prolong play sessions (chimpanzees: Davila-Ross et al. 2011; domestic dogs, Canis lupus familiaris: Palagi et al. 2015; geladas, Theropithecus gelada: Mancini et al. 2013; Tonkean macaques, Macaca tonkeana: Scopa and Palagi 2016; meerkat, Suricata suricatta: Palagi et al. 2019). Based on the above, mutual engagement by the participants seems to be more critical in play fighting than in other contexts (Brownell 2011; Heesen et al. 2017; Palagi et al. 2016). Moreover, as play fighting can be initiated with or without a face-to-face opening, it seems to provide a good model to test whether the face-to-face configuration serves to establish mutual intersubjective engagement. In various species, play fighting is initiated when the players are in a face-to-face configuration and making direct eye contact (domestic dogs, wolves, Canis lupus, and coyotes, Canis latrans: Bekoff 1974; orangutans, Pongo pygmaeus: Rijksen 1978; mountain gorillas: Yamagiwa 1992; chimpanzees: Fröhlich et al. 2016; Japanese macaques: Iki and Hasegawa 2020; Fig. 1a). By shaping a situation in which each interactant is attending to the partner, the face-to-face opening may serve as a platform for active engagement in subsequent play fighting and enhance symmetry within play. On the other hand, play fighting can be initiated without a face-to-face opening, although this variation has received less research attention (Heesen et al. 2017). In this case, play is initiated by one individual making a surprise attack from behind on an inattentive partner (Iki and Hasegawa 2020; Fig. 1b). In previous research comparing play sessions initiated with a face-to-face opening (FF-initiated play) with play sessions without one (non-FF-initiated play) in Japanese macaques, we found that the former lasted longer than the latter and that in non-FF-initiated sessions, the individual who initiated a session by performing a surprise attack from behind a partner primarily took an offensive role (Iki and Hasegawa 2020). However, whether the face-to-face configuration serves to establish the mutual engagement of both interactants for subsequent interaction remains unclear. In this study, by comparing FF-initiated and non-FF-initiated play, we examined whether the former is characterized by a mutual engagement that is absent in the latter.

Fig. 1
figure 1

a A sequence showing the face-to-face opening of play fighting. The individuals were facing each other (left figure), and then playful contact was made (right figure). b A sequence showing the non-face-to-face opening of play fighting. One individual (i.e., initiator) approached the other (i.e., the other) from behind (left figure) and then playfully grabbed the other' s body (right figure). c A schematic illustration of the sequence of an FF-initiated play session, described as state transitions, and calculation of the total duration of asymmetric attacks by the first attacker. SYM symmetric attack, ASYMF asymmetric attack by the first attacker, ASYMO asymmetric attack by the other player, NO no attack

Specifically, we examined the following predictions. As mentioned, in play fighting, players compete to gain an advantage over their playmates (e.g., Aldis 1975). Hence, the proportion of time during which each player maintains an advantage over the playmate is likely to reflect the degree of active engagement in play by each player (Iki and Hasegawa 2020). We regarded a player as having an advantage if they unilaterally attacked (Pellis and Pellis 1987, 1997) or pinned down the partner (Biben 1986; Bauer and Smuts 2007). An attack was regarded as unilateral if the attacker's face or belly was oriented toward the partner, whose entire body was oriented away from the attacker. Pinning down was considered to have occurred when a player in a standing or sitting position leaned on the partner, causing the partner to lie down in a prone, supine, or lateral position. We calculated the degree of engagement asymmetry between players for each play session based on the difference in the amount of time with which each player maintained an advantage over their opponent. If face-to-face configurations serve to establish mutual engagement for subsequent interactions, then play sessions with a face-to-face opening would be expected to have a lower degree of engagement asymmetry than play sessions without one (Prediction 1). However, as our definition of play asymmetry reflects the difference in advantage duration between players, the value of play asymmetry may decrease when both individuals are not actively engaged in play and gained no advantage. To rule out this possibility, and confirm that the individual who faced the partner when play began was actively engaged in play to gain an advantage, we also made the following predictions regarding the duration of asymmetric attacks in which one of the players gained an advantage over the other. In the non-FF-initiated play, only the play initiator, who was unilaterally facing the partner at the start of the play session, would be expected to play vigorously and try to maintain an advantage over the playmate. Hence, in non-FF-initiated play, the duration of an asymmetric attack by the play initiator would be longer than that by the other player (Prediction 2a). Conversely, in the FF-initiated play session, symmetric engagement by both individuals was expected. Since there is no unique "initiator" in FF-initiated play, which starts with both players facing each other, we labeled the first player who gained a unilateral advantage over the opponent in each session as the "first attacker". If face-to-face configurations function to establish symmetric engagement of the interactants, then the duration of asymmetric attacks in which the first attacker gains a unilateral advantage over the playmate would be no different from the duration of asymmetric attacks by the other player (Prediction 2b).

Material and methods

Study site and subjects

This study of a free-ranging provisioned group of Japanese macaques was conducted at Jigokudani Monkey Park in Shiga-Heights, Nagano Prefecture, Japan. Demographic records of the group have been kept by the park staff since 1962. Group size has varied from 213 to 211 individuals, due to deaths. In January 2017, there were 14 adult males (aged > 4 years), 93 adult females (aged > 3 years), 30 juvenile males (aged 1–4 years), 38 juvenile females (aged 1–3 years), 18 infant males, and 20 infant females. The group was fed barley, soybeans, and apples four times daily (09:00, 12:00, 15:00, and 16:30) by the park staff. For a detailed description of the research site, see Wada and Ichiki (1980). To exclude the effects of sex and age differences, we focused on play bouts between two individuals of the same sex and age. Our study subjects were 0- to 2-year-old males participating in play fighting (0 years old, infants; 1 and 2 years old, small juveniles). Since immature males play more frequently than immature females (Eaton et al. 1986) and prefer to play with same-sex partners (Glick et al. 1986), we selected immature males for efficient data collection. All of the infants born between April 2016 and June 2016 were between 7 and 9 months of age at the beginning of the observation and had begun to feed on their own, but had not been fully weaned. Our subject group included 18, 11, and 13 males aged 0, 1, and 2 years, respectively.

Data collection

The first author conducted behavioral observations during January–March 2017 (40 days). The observer stood in specific locations where individuals of the study group could be observed, and recorded all visible bouts of play fighting on video. To avoid observation bias, the observer regularly changed the observation location. Since focal sampling is not sufficiently efficient for relatively infrequent behaviors such as play fighting (Martin and Bateson 2007), we did not use focal sampling. When multiple play bouts occurred at the same time, we focused on pairs within the target age group who had the fewest observations. We conducted video recording from 09:00 to 16:30 using a digital video camera (HDR-TD10; Sony Corporation, Tokyo, Japan). Animals were not observed for 30 min before and 30 min after feeding times. We maintained a distance of at least 1.5 m from the subjects at all times. Because age differences between interactants might affect which player had an advantage over the other or influence the degree of engagement in play, we only included play sessions between two individuals of the same age in our analysis. Additionally, all play bouts met the following requirements (Reinhart et al. 2010; Iki and Hasegawa 2020): the entire interaction, from initiation to conclusion, occurred on relatively flat ground, not in three-dimensional environments such as in trees or on fences; there was at least one bite but no behavioral elements that indicated serious fighting (e.g., screaming or bared-teeth display); and no objects, such as leaves or rocks, were involved. Play bouts preceded by grooming, contact-sitting, mounting, or play chasing within the 10 frames (about 0.333 s) immediately before the first playful attack were excluded because these behaviors might have affected interactants’ engagement in play. Hence, we analyzed only FF- and non-FF-initiated play bouts. Overall, approximately 117 h of video data were collected covering 31 dyads of 0-year-old males, 20 dyads of 1-year-old males, and 15 dyads of 2-year-old males. The dataset consisted of 185 play bouts that met the above requirements (n = 71, 46, and 68 for 0-, 1-, and 2-year-old subjects, respectively).

Video coding

We performed frame-by-frame (30 fps) video analyses using Behavioral Observation Research Interactive Software (BORIS; Friard and Gamba 2016). We defined the initiation of a play bout as the moment when an individual directed a playful attack (i.e., biting, grabbing, pushing, slapping, wrestling) on his partner. Following previous studies (e.g., Beltrán Francés et al. 2020; Biben 1986; Biben and Symmes 1986; Mancini and Palagi 2009; Palagi and Mancini 2011; Scopa and Palagi 2016), we considered a play about to have ended when partners stopped play for more than 10 s. We categorized the beginning of each bout as FF or non-FF (Fig. 1a and b). We regarded a bout as FF-initiated play when individuals faced each other within 10 frames (approximately 0.333 s) immediately before the first playful attack. Conversely, we regarded a bout as a non-FF-initiated play if one individual directed his face at his partner and initiated play from behind his partner while his partner was looking away. We determined which, if either, player gained an advantage every 10 frames (about 0.333 s). States of play interaction were coded every 10 frames (about 0.333 s) and divided into the following three categories:

Symmetric attack Players attacked each other and neither gained an advantage.

No attack Neither player delivered any attacks.

Asymmetric attack One of the players had an advantage over his partner.

If the state of the current 10 frames was the same as the state of the previous 10 frames, the state was considered continuous. A state immediately prior to the first playful contact in each session was labeled the "start state". Following these procedures, the sequence of play fighting can be described in terms of the transitions between the abovementioned states, beginning with "start state" (Fig. 1c). Our dataset contained 511 (n = 166, 128, and 217 for 0-, 1-, and 2-year-old subjects, respectively) and 82 (n = 46, 25, and 11 for 0-, 1-, and 2-year-old subjects, respectively) asymmetric attacks performed in FF-initiated and non-FF-initiated play, respectively.

Twenty randomly chosen bouts (i.e., 10.8% of all play sessions) were coded by a separate researcher to assess inter-observer reliability. The resulting Cohen’s kappa coefficient values were 0.765 for the type of play initiation and 0.825 for the type of play state.

Statistical analyses

We analyzed the data using generalized linear mixed models (GLMMs; glmmTMB function in the glmmTMB package) using R v. 3.6.3 (R Core Team 2020). For all analyses, we included player age (continuous) as a fixed variable to control for confounding effects. We set our alpha level at 0.05. To test the significance of the predictors, we compared the full models with the null models including only the control factor and the random effect using the likelihood ratio test. If interaction terms were non-significant, we removed them from the models. The datasets analyzed in this study can be accessed at: https://osf.io/5bxw7/.

For the analysis of the degree of asymmetry in active engagement between players, we used a GLMM with a gamma error structure and a log link function. The response variable was calculated by subtracting the total duration of the asymmetric attacks in each bout by a player that held an advantage for a shorter time from the total duration of the asymmetric attacks by the other player (Fig. 1c). The play bout duration (log-transformed) was controlled as an offset variable. The key predictor was the type of play initiation (categorical: FF or non-FF). To deal with pseudo-replication, we included play dyad as a random effect.

For the analyses of the duration of each asymmetric attack, we used GLMMs with a gamma error structure and a log link function. The response variable was the duration of each asymmetric attack in non-FF- and FF-initiated play. The key predictors were the type of attacker (i.e., “play initiator” or “the other” for the non-FF-initiated play; “first attacker” or “the other” for the FF-initiated play). Because the duration of an asymmetric attack might change as the session progressed or depending on the state of play preceding an asymmetric attack, we included the following factors as control variables to control for possible confounding effects in addition to player age: elapsed time since the start of play (continuous), the last state preceding an asymmetric attack (categorical: symmetric attack, asymmetric attack, no attack, or start state), and two-way interaction between type of attacker and elapsed time. To deal with pseudo-replication, we included identities of the attacker, recipient, and dyad as random factors.

Results

Overall, the mean ± SD duration of the play bouts was 21.05 ± 25.22 s. Of the 185 bouts of play fighting in our dataset, 150 and 35 were FF- and non-FF-initiated bouts, respectively. For FF- and non-FF-initiated play bouts, the mean ± SD duration of the play bouts were 23.15 ± 27.08 and 12.0 5 ± 11.14 s, respectively.

With regard to the degree of inter-player asymmetry in active engagement, the full model accounted for significantly more variance than the null models (χ2 = 7.88, df = 1, p < 0.01). The degree of asymmetry was affected by the type of play initiation, with the degree of asymmetry in non-FF-initiated play being significantly greater than that in FF-initiated play (Table 1; Fig. 2; Prediction 1 supported).

Table 1 Factors affecting the degree of asymmetry, duration of asymmetric attacks in FF-initiated play, and duration of asymmetric attacks in non-FF-initiated play
Fig. 2
figure 2

Boxplots of the degree of asymmetry in FF-initiated and non-FF-initiated play sessions. **p < 0.01

In terms of the duration of an asymmetric attack performed in non-FF-initiated play, the full model accounted for significantly more variance than the null models (χ2 = 5.84, df = 1, p < 0.05). The asymmetric attack duration in non-FF-initiated play was affected by the type of attacker, with asymmetric attacks performed by the play initiator lasting significantly longer than those performed by the other player (Table 1; Fig. 3a; Prediction 2a supported). Conversely, for the duration of an asymmetric attack performed in FF-initiated play, the full model did not account for significantly more variance than the null model (χ2 = 2.73, df = 1, p = 0.10). Indeed, the duration of asymmetric attacks in FF-initiated play did not differ according to whether the attack was performed by the first attacker or by the other player (Table 1; Fig. 3b; Prediction 2b supported).

Fig. 3
figure 3

Boxplots of the duration of asymmetric attacks: a by the first attacker or the other player in FF-initiated play sessions and b by the initiator or the other player in non-FF-initiated play. ns non-significant, *p < 0.05

Discussion

This study examined whether play sessions preceded by a face-to-face configuration were characterized by players’ mutual active engagement. Overall, our results showed that both individuals behaved more actively in interactions with a face-to-face opening than in interactions without one. Defining the inter-player asymmetry of active engagement in play based on the time during which each individual held an advantage over the other, we found that when play began with a face-to-face opening, the play was more symmetrical than when play began without one.

This result is interesting in light of the "50:50 rule" proposed in classic studies of play fighting (Aldis 1975; Altmann 1962). Given that play fighting involves aggressive behavior patterns (e.g., biting) that are also used in serious fights, players need to maintain a playful mood to prevent play from escalating into a serious fight (Palagi et al. 2016). The 50:50 rule posits that animal players prevent escalation into serious fights by maintaining symmetry, sharing equally the offensive (i.e., gaining an advantage) and defensive (i.e., surrendering an advantage) roles with their playmates. However, several empirical studies have shown that, in contrast to the 50:50 rule, play tends to be asymmetrical, with one individual holding significantly more advantage than the other (domestic dogs: Bauer and Smuts 2007; Ward et al. 2008; wolves: Essler et al. 2016; but see Kottferová et al. 2020). Our results suggest that when considering the symmetry or asymmetry of play, it is also important to consider how play is initiated.

Although play asymmetry could be reduced if both players avoided actively attacking the other and continued to play without gaining an advantage, the results of our analyses related to Prediction 2 suggest that this was not the case. In play bouts that were not preceded by a face-to-face configuration, only play initiators who faced their partner at the onset of play attacked their partner aggressively. Conversely, in play bouts preceded by a face-to-face configuration, both players attacked their partner aggressively. This result also implies that the face-to-face configuration functions as a platform to establish mutual engagement by both interactors.

Since we did not use focal sampling, we could not compare the frequency of FF- and non-FF-initiated play. Nevertheless, the fact that FF-initiated play accounted for more than 80% (150/185) of our dataset suggests that Japanese macaques likely assume a face-to-face configuration before play initiation. As mentioned, play fighting can escalate into serious fighting. Hence, to reduce the risk of escalation, it may be important to indicate to the partner in advance of play initiation that subsequent interactions are intended to be playful (Heesen et al. 2017). Studies on WEIRD human populations have shown that eye-to-eye or face-to-face contact before initiating an interaction indicates the participant's availability for that interaction (Kendon 1990), denotes a friendly stance toward the partner (Pillet-Shore 2018), and clarifies the nature and meaning of the subsequent interaction (Goffman 1967). Similarly, our subjects may have assumed the face-to-face configuration before play began to establish intersubjective engagement and communicate that the subsequent interaction would be playful in nature (Heesen et al. 2017). These results are also consistent with the finding that, beyond the context of play, primates use the face-to-face configuration as a platform to introduce other activities, such as copulation (bonobos: Savage and Bakeman 1978; stump-tailed macaques: Chevalier-Skolnikoff 1974) and reconciliation (chimpanzees: de Waal and Roosmalen 1979).

It should be noted that this study focused only on the face-to-face configuration at the moment the first playful contact occurred. Our results do not exclude the possibility that facing each other and establishing eye contact during ongoing play may also serve to maintain or re-establish interactants' engagement. Related to this possibility, several studies have shown that various species coordinate the engagement of playmates and promote a playful mood by emitting play signals during ongoing play (humans: Rothbart 1973; gorillas, Palagi et al. 2007; bonobos: Palagi 2008; chimpanzees: Davila-Ross et al. 2011; geladas: Mancini et al. 2013; domestic dogs: Palagi et al. 2015; Hanuman langurs, Semnopithecus entellus: Špinka et al. 2016; South American sea lions, Otaria flavescens: Llamazares-Martín et al. 2017). Although play fighting in Japanese macaques is not accompanied by play vocalizations, it involves a so-called play face (Scopa and Palagi 2016). While some studies have shown that play signals are used to maintain play rather than to initiate it and that play initiation is not always accompanied by play signals (Palagi and Mancini 2011; Wright et al. 2018), it is still possible that the face-to-face configuration that precedes play may include play faces or other play signals. We did not record data on play signals, and further research is needed to clarify the influence of a face-to-face configuration characterized by play signals on subsequent interactions.

Although cases in which individuals continuously shifted from grooming, contact-sitting, mounting, or play chasing to play fighting were excluded from the analysis, this cannot completely rule out the possibility that the interaction history preceding the play session, and the associated emotional state, may affect play engagement. An individual's emotional arousal may last for several minutes (e.g., Ioannou et al. 2014), and positive and negative emotional states can cause optimistic and negative biases in their judgement, respectively (Mendl et al. 2009; Paul et al. 2005; Saito et al. 2016). Since play fighting involves aggressive behavior patterns, players need to constantly judge whether their counterparts' actions are playful or not. Hence, the interaction history between individuals and associated emotional states are likely to affect play engagement. Because we did not use focal sampling and collected only data for play events, we were unable to systematically collect data on the interaction history of individuals before play began. Hence, this study could not address this hypothesis. Further well-controlled studies are needed to address this issue.

Because it is difficult to identify accurately the direction of animals’ gaze in natural settings (Watson et al. 2015), the present study focused only on the face-to-face configuration at the onset of play bouts rather than on eye contact. Our study cannot rule out the possibility that eye-to-eye contact and the face-to-face configuration are functionally different. However, as indicated by the suggestion that the morphology of the non-human primate eye has evolved to camouflage the direction of gaze from others by reducing the contrast between the sclera and the iris, the non-human primate eye is ambiguous as a cue for the direction of attention available to others (Kobayashi and Kohshima 2001). Also, in quadrupedal animals, the direction of the body and head may be more important than that of the eyes as a cue for the direction of attention (Emery 2000). Further investigation is needed to examine the function of eye contact before the onset of interaction.

The subjects in this study were all males aged 0–2 years, and the function and significance of the face-to-face configuration may be affected by social factors (e.g., dominance rank and kinship) and individual attributes (e.g., sex and developmental stage). As Japanese macaque society is male-dominant, there are likely sex differences in gaze behavior among adults (cf. Chance 1967; Phillips and Mason 1976; Watson et al. 2015). Although this study focused on immature individuals that were not yet fully integrated into the dominance hierarchy of the group, it is an interesting question whether sex differences in the function and use of the face-to-face configuration exist before sexual maturity. Several studies have suggested that making eye contact and facing one another serve as mild threats in primate species with despotic societies (e.g., Harrod et al. 2020). Given that Japanese macaques have rather despotic societies (Thierry 2000), our results indicating that active engagement with playful interaction is promoted by a face-to-face configuration are even more interesting. Our results might be interpreted as providing support for the hypothesis that, rather than conveying a single species-specific message that is specified on the agonistic–affiliative axis, the face-to-face configuration serves as a basis for intersubjectivity by creating a situation in which one's attention and the attention of others are in contact and interactants are actively engaged with each other.