Public significance statement Our study investigates the public's perception of art generated by Artificial Intelligence (AI) compared to human-created art. We found that biases in evaluating AI-created art arise mainly in scenarios where AI and human art are presented together, thus highlighting a competitive dynamic. Notably, our findings reveal that rather than devaluing AI art, there is an upvaluation of human art. This research is relevant as it challenges the notion of a negative bias against AI in art and provides insights into how both AI and human creativity are valued in contemporary society. We further find evidence that the positive bias can partially be traced back to human ability of empathy. As AI continues to expand its role in creative fields, understanding public perception of AI-generated art becomes crucial, especially in contexts where it competes with human art. These insights have broader implications for the evolving dynamics between technology and human creativity.

1 Introduction

Due to the development and wide availability emerging technologies such as generative adversarial networks (GAN; Aggarwal et al. 2021) in recent years, the pervasiveness of AI in the field of (digital) art and creativity has greatly increased. Already today, AI writes poems, creates paintings, composes music, and choreographs dance routines (Darda and Cross 2023). Due to these changes, it is necessary to understand how AI-generated pieces of art are evaluated by spectators and what individual differences could be influencing the evaluation.

Previous studies on the evaluation of AI-generated vs. human-created art have focused on two issues: whether humans are able to distinguish between AI-generated art and human-made art and whether humans have a bias toward AI-generated artworks. Regarding the ability to distinguish artworks, most studies have reported that people could not consistently differentiate between human-made and AI-generated art (Chamberlain et al. 2018; Gangadharbatla 2022; Hitsuwari et al. 2023; Samo and Highhouse 2023). Concerning a bias toward AI-generated artworks, numerous studies have reported a negative bias toward AI art (Bellaiche et al. 2023; Chamberlain et al. 2018; Darda and Cross 2023; Hitsuwari et al. 2023; Hong 2018; Ragot et al. 2020; Samo and Highhouse 2023)—and that this bias persists even when the source of the art is nebulous to the participants. However, other studies could not confirm a bias against art created by AI (Gangadharbatla 2022; Hong and Curran 2019; Zlatkov et al. 2023). Hence, the conditions of the emergence of a negative bias toward AI-created art remain unclear to this day, and an empirical explanation would be appealing. In this study, we first reviewed the previous studies on the emergence of a bias against AI-created art and linked their methodological similarities and differences to the emergence of such a bias. Building on these findings, we conducted an empirical study that could account for possible influences of the study design on the emergence of a bias against AI-generated art. Specifically, our aim was to examine whether and how simultaneous and independent presentation of AI-generated art and human-created art affect purchasing intentions and perceived aesthetic value of human-made art.

RQ1: Is human-created art preferred over AI-created art between groups that are asked to evaluate either human-created or AI-created art? When human-created and AI-created art are presented simultaneously?

Bellaiche et al. (2023) argued that the preference for human-created art might be rooted in the experience that human-created art is a deeper communicative medium that transports a narrative and reflects the artist’s effort and time. Others have postulated that a threat to the anthropocentric worldview could be the reason for a bias against AI-created art (Millet et al. 2023). However, it is also possible that the perception that AI-created art competes with human artists’ jobs leads to a preference for the latter. The degree to which individuals act upon or even recognize threats toward others varies from person to person. Thus, to investigate this competition, we focus on traits that may be related to experiencing a stronger connection with or compassion for artists (i.e., altruism and empathy) and assess their influence on the perception of AI vs. human art.

RQ2: Do personality traits influence the extent of bias against AI-created art? Does altruism influence whether and to what extent human-created art is preferred over AI-created art? Does empathy influence whether and to what extent human-created art is preferred over AI-created art?

Finally, people’s attitudes toward AI might influence the appreciation of AI-generated art (Hong et al. 2021). Thus, having a positive attitude toward AI might counterbalance a bias against AI-generated art.

RQ3: Does a personal positive attitude toward AI lead to less bias against AI-created art?

2 Theory

2.1 AI artwork

When asked directly, humans generally judge images created by AI as “art” (Mikalonyté and Kneer 2021), and in many cases, for the layperson, AI-generated works of art are indistinguishable from human-made ones (Samo and Highhouse 2023). However, others argue that AI cannot be an artist and cannot create art, because creating art requires intention and mutual message sending and receiving (Hertzmann 2020; Hong 2018). Moreover, researchers argue that AI alone, without any human input such as a training database or the programming of specific art-producing algorithms, cannot create art or art-like products. In other words, humans are always involved in AI-produced artwork and a clear distinction between human-made and AI-generated art may be difficult (Epstein et al. 2020). Therefore, it is crucial for this study to define what we mean by AI-generated art and what the technological advances mean for the connection between artist and art.

We understand AI art as any digital artwork which is the output of an AI tool (e.g., Midjourney; Midjourney, Inc. 2022) that has been prompted by a human, usually via written language (e.g., Stable Diffusion; [Stability AI 2022] or Dall-E [OpenAI 2021]. While AI art is often linked with visual mediums like images and videos, it can extend to audio compositions such as music as well. The history of AI-assisted art dates back to 1973 (Garcia 2016), when Harold Cohen started his long-term project AARON, in which he translated components of visual decision-making into a small painting robot (Grba 2022). In these early stages, the interconnection between technology and the artist appears strong and direct. Over the last decades, however, the evolution of AI technologies, such as machine learning (ML), pattern recognition, and, more recently, generative adversarial networks (GANs) and text-to-image (T2I) diffusion models have undoubtedly influenced artistic creation. These technologies provided artists with new tools to explore creative processes, leading to innovative artworks that challenge traditional notions of artistry and creativity. However, they also have strongly blurred the line between the artist and the tool (Mazzone and Elgammal 2019). Today's highly complex tools (i.e., generative AI) allow not only artists to create vivid images, but anyone with access to this tool. Although operators still retain some control over tool functions, contemporary tools have increased degrees of freedom with regard to decisions in the design process. Consequently, attributing credit for the final artistic output has become more nebulous, as it is not immediately clear which entity—such as the AI tool developer, the artists whose work populates databases, or the tool operator—deserves what portion of recognition and the connection between artwork and artist may be perceived less direct. Thus, generative AI “challenges conventional definitions of authorship, ownership, creative inspiration, sampling, and remixing” (Epstein et al. 2023, p. 1110). Furthermore, the number of individuals who can produce digital art or art-like products with less effort (i.e., time and resources) has increased through the emergence of new AI-tools. Against the above, thus note, that when we henceforth speak of AI-created art we do not imply AI to work independently from humans. Further, the discussion on whether AI-created art should be considered art or not is also not the focal issue of this study. In line with Epstein et al. (2023), we do not see generative AI as a general threat to art itself, but more as a new medium, whose impacts, however, should be studied. In the next paragraph, we will discuss the previous studies and their results concerning the comparison of human-made and AI-created art.

2.2 AI artwork vs. human artwork

Several studies have investigated whether the aesthetic appreciation and judgment of artworks are negatively biased toward AI-generated art, yielding mixed results (for an overview of studies that have examined this effect in the context of images, see Table 1). While some studies could not confirm any bias toward AI-generated art, others have reported a negative bias toward AI-made artworks. Table 1 shows that, in the previous studies on bias against AI-created images, the studies that applied a design in which AI- and human-generated art were both presented to the same participants were able to detect a bias, whereas most studies with a design in which participants rated either AI- or human-generated art could not find such a bias. To the best of our knowledge, only two studies found a negative bias towards AI when the images were rated separately. In the study by Gu and Li (2022), however, art teachers and students were surveyed, which should have made the competition aspect inherently salient. Importantly, in the same article, the bias did not manifest when non-experts were the respondents. The second study actually used human and AI-created images, which were compared, without displaying any or varying any labelling to give away the author (human vs. AI) of the art piece (Samo and Highhouse 2023). Therefore, we argue that a negative bias toward AI-generated art arises only when a competition between AI-created art and human-created art is salient. Regrettably, no study has incorporated both (between- and within-subject design) in one experiment. Thus, to understand the previous mixed results and to determine whether they can partially be attributed to methodological differences, in this study, we incorporated an experimental design with both a between and a within component.

Table 1 Previous studies on AI art

2.3 Considering individual differences: the roles of altruism, empathy, and attitudes toward AI

The perception, judgment, and appraisal of art are subjective matters, and individuals’ assessments do not always agree (Pelowski et al. 2017). Thus, the role of individual differences should be considered when investigating people’s appreciation and judgments of art. The extents to which psychological variables can explain differences in evaluations of AI-created vs. human-made art are particularly important. Only one study to date has looked at the relations between interindividual psychological differences and differences in evaluations of AI-created and human-made art. Bellaiche et al. (2023) looked at the influences of empathy, openness (Big Five), attitudes toward AI, and age on evaluations of both AI-created and human-made art but could not report any robust influences. Specifically, they did not find that any of the aforementioned variables significantly influenced the perceived worth, beauty, profundity, or subjective liking of art pieces, nor did these variables interact with the alleged creator (AI vs. human) of a piece of art. The authors found only that a personal positive attitude toward AI led to a higher appreciation of AI art compared with human-made art. The authors thus emphasized that further research is needed. Another study (Millet et al. 2023) examined the role of anthropocentric creativity beliefs in the emergence of a bias against AI-created art and found an interaction of this variable with the label (human vs. AI). We aim to build upon this research and extend the search for potential psychological explanations for differences in evaluations of AI vs. human art. However, our focus does not lie solely on identifying the psychological variables that are correlated with the overall perception of human-made and AI-generated art. Instead, we aim to identify variables that could potentially prompt individuals to recognize the competition that AI art might pose to human artists.

Altruistic behavior is motivated by a desire to benefit another person without expecting benefits for oneself (Feigin et al. 2014) and can be evoked by norms of appropriate behavior (Penner et al. 2005) or the experience of empathy (Batson 1987). Altruism includes peer punishment (PP), help-giving (HG), and moral courage (MC; Windmann et al. 2021). PP refers to individuals’ personal readiness to sacrifice their self-interest to impose punishment on those who violate social norms (e.g., fairness or reciprocity). The facet of HG entails the act of generously offering one's resources to individuals who are in need or who are deemed deserving (Windmann et al. 2021). Finally, MC signifies the readiness to protect personal ethical values and to uphold moral personal principles in the face of social threats, often in the context of a social power imbalance (Windmann et al. 2021). It has been argued that human-made art might be rated higher than AI-generated art due to the fear that AI might replace human creativity (Zlatkov et al. 2023), thereby threatening the identity of humans as the only entity capable of creativity (Millet et al. 2023). Due to a greater willingness to protect ethical values and moral principles and a desire to benefit those in need of protection (i.e., potentially human artists), the facets of MC and HG may be especially likely to play roles in the evaluations of human-made art in the presence of AI-generated art.

Empathy encompasses cognitive and emotional aspects and is considered a highly relevant psychological trait in examinations of aesthetic experiences in general (Wilkinson et al. 2021). Bellaiche et al. (2023) examined whether participants' judgments might be explained by their ability to empathize with other agents, including AI. Specifically, they investigated the extent to which human empathy could be transposed onto AI and whether any resulting disparities would manifest in the realm of art appreciation. As previously stated, they found no impact of empathy on the assessments of the presented images. The somewhat unexpected nature of these findings might be attributed to methodological factors. The utilization of a unidimensional empathy questionnaire (Spreng et al. 2009), as opposed to the more prevalent measurement tools that encompass multiple dimensions of empathy, might have played a pivotal role. Indeed, researchers have increasingly converged on a consensus that empathy is a multidimensional phenomenon that encompasses cognitive and emotional components (Davis 1996; Lima and Osório 2021; Malakcioglu 2022). Consequently, it is plausible that certain aspects of empathy exert an impact, whereas others do not. Cognitive empathy refers to the ability to empathetically adopt and undertake someone’s perspective and therefore describes an individual's capacity to spontaneously understand and perceive things from another person’s psychological standpoint (Davis 1983). This facet of empathy is thus often referred to as perspective taking (PT). Emotional factors include fantasy, personal distress, and emphatic concern (Davis 1980, 1983). While fantasy (FS) captures the inclination to emotionally immerse oneself in the world of characters in novels or movies, personal distress (PD) evaluates personal feelings of anxiety and discomfort when confronted with the mischances of others (Davis 1980; Pulos et al. 2004). Finally, empathic concern (EC) is utilized to gauge feelings directed toward others (e.g., compassion or concern for individuals experiencing distress). The presence of AI-generated art may be perceived as competition with human creativity and a threat to artists’ jobs. Therefore, individuals with greater feelings of compassion for human artists and a greater ability to understand this potential competition are particularly likely to be affected by the presence of AI-generated art (i.e., EC). Furthermore, individuals with a greater capacity to perceive things from a human artist’s point of view might also be more sensitive to the competition between AI and human art creators (i.e., PT). Consequently, for the evaluation of human-created art while one is exposed to AI-generated art, the facets of PT and EC might be important.

An additional variable that may be relevant to the evaluation of AI-created art are attitudes toward AI. While some people or cultures have large concerns about the security of AI and fear that AI might have the potential to replace humans in workplaces (Bergdahl et al. 2023), others tend to be more open to AI and appreciate the advantages it offers to humans (Sindermann et al. 2021). The acceptance of AI and the evaluation of AI-generated products are likely to be influenced by individuals’ general attitudes toward AI. For example, Bellaiche et al. (2023) found that a positive attitude toward AI led to a higher perceived profundity and worth of a painting when it was labeled as created by AI rather than as created by a human.

2.4 Hypotheses

Based on the previous findings (see Table 1), we expected a bias toward AI-generated art, which may influence purchasing intentions and perceived aesthetic value for human-made art. We, however, expected this bias to arise only when the competition from AI-generated art is made salient, that is, when AI-generated art is compared with human-made art. More specifically, we hypothesized the following:

H1a: Purchasing intentions are higher for human-made digital art evaluated in the presence of AI-generated art than for AI-generated art evaluated in the presence of human-made art (competition condition).

H1b: There is no difference in purchasing intentions between independently rated human-made digital art and independently rated AI- generated digital art (control condition).

H1c: Subjectively perceived aesthetic value is higher for human-made digital art evaluated in the presence of AI-generated art than for AI-generated art evaluated in the presence of human-made art (competition condition).

H1d: There is no difference in perceived aesthetic value between independently rated human-made digital art and independently rated AI-generated digital art (control condition).

The presence of AI-generated art while evaluating human-made art might reinforce concerns about AI (partially) displacing jobs in creative sectors or threatening artists’ livelihood. This reinforcement might affect intentions to purchase human-made art and evaluating it aesthetically more positively, especially for people with more pronounced altruism and empathy.

H2a–H2d: When human-made digital art is evaluated in the presence of AI-generated art, intentions to purchase human-made art are positively related to (a) empathic concern, (b) perspective taking (both of which measure empathy), (c) help-giving, and (c) moral courage (both of which are altruism variables).

H3a–H3d: When human-made digital art is evaluated in the presence of AI-created art, subjectively perceived aesthetic value is positively related to (a) empathic concern, (b) perspective taking (both of which measure empathy), (c) help-giving, and (c) moral courage (both of which are altruism variables).

Beliefs about AI’s ability to be creative and the acceptance of creative AI are positively related to the assessment of AI-created art (Hong and Curran 2019; Hong et al. 2021).

H4a: When AI-created digital art is evaluated in the presence of human-made art, attitudes toward AI are positively related more to intentions to purchase AI-created art, than to purchase human-created art.

H4b: When AI-created digital art is evaluated in the presence of human-made art, attitudes toward AI are more positively related to the perceived aesthetic value of AI-created art, than to purchase human-created art.

3 Methods

This study was approved by the Ethics Committee of the University of Hohenheim (Ref. No 2023/8_Neef.) and adhered to the ethical guidelines of the American Psychological Association (American Psychological Association 2017). For an overview of the R-packages used, see Appendix. Further, all material, which is not cited and described in the following sections, can be found in the Appendix. This study was preregistered https://aspredicted.org/PBR_YZC.

3.1 Sample

We collected data from a representative sample with respect to age, gender, education, and income, of the German Internet-using population via a paid panel from May 10 to May 22, 2023. A total of N = 1179 participants completed our preregistered study (see preregistration for sample size determination). Adhering to our preregistered exclusion criteria, n = 60 participants were excluded for failing to answer the attention checks appropriately, n = 38 for implausibly fast response times (Leiner 2019), n = 127 for missing data in critical variables (more than 20% overall or less than 50% within one variable), and n = 2 for accurately identifying the research objective. Exclusions resulted in a final sample of N = 952 (Mage = 49.2, SDage = 16.0; 50.8% female participants). On average, participants needed M = 14.6, SD = 5.3 min to complete the questionnaire. All participants gave informed consent to participate.

3.2 Midjourney

All images used in this study were created with the generative AI program Midjourney (Midjourney, Inc. 2022). Midjourney employs a diffusion-based T2I model to convert textual descriptions (prompts) into visual outputs (Lu et al. 2023). This process consists of a large language model (LLM) which is utilized to extract the semantic meaning of a prompt, which is then encoded into a numerical vector that guides the subsequent image generation (Midjourney, Inc., 2022). The diffusion model itself functions by incrementally introducing stochastic noise into the dataset of training images. As the model undergoes training, it acquires the capability to reconstruct the original imagery by methodically reversing the introduced noise, and with sufficient iterations, the model can synthesize new images (Midjourney, Inc. 2022). The images were chosen from the Midjourney Community Showcase in January 2023, representing selections that received particularly favorable ratings from Midjourney users.

3.3 Procedure

The entire study was conducted in German. Initially, participants were presented with a standard message that provided information about duration and setting (in a calm setting and using a computer), followed by inquiries about demographic details. The participants were randomly categorized into four groups (two control; two experimental). Subsequently, irrespective of their assigned group, all participants were requested to evaluate 22 digital art images based on aesthetic appeal and provide their intentions to purchase each image (for an example image, see Fig. 1; all digital art images can be found in the Appendix). The first two images served only as attention checks to ensure participant engagement. The analysis included only images 3–22. We manipulated the alleged creator of the images. The first control group evaluated images that were labelled AI-generated, whereas the second control group evaluated images labelled human-artist-made. We refer to these groups as C-AI and C-Human. The other two (experimental) groups each rated ten images labelled as AI-generated and ten labelled artist-made, thus creating a subliminal competitive scenario (competition condition). From the set of 20 images, Experimental group A evaluated the first set of images under the label AI-generated and the second set under the label artist-made. Experimental group B evaluated the images with the opposite labeling (for the images in each set refer to the Appendix). Henceforth, we refer to images labeled as human-created amongst AI-generated images as Ex-Human and images labeled as AI-generated amongst artist-created images as Ex-AI. It is important to note, however, that the actual order of the images was randomized to exclude order effects. As an example, a participant in Experimental Group A could have first evaluated one AI-labeled image from set 1, then two human-labeled images from set 2, then again, an AI-labeled image from set 1 and so on. The question assessing purchasing intentions was “Would I buy this piece of art?”; aesthetic value was assessed via “How do you rate the attractiveness of this work of art?”. Please note that we are aware that assessing purchase intention and the aesthetic value do not fully capture the general appreciation of the respective art pieces. However, we used variables which have been used in previous studies (for a similar measure, see, e.g., Gu and Li 2022). Additionally, using only single items might further compromise the measurement (Neef et al. 2023). However, since our participants had to rate 22 images using a scale would have inflated the length of the survey considerably. Thus, we opted for a balance between measurement accuracy and burden for our participants.

Fig. 1
figure 1

Example image of AI-generated art by Midjourney

The specific instructions for the groups can be found in the Appendix. After evaluating the images, participants were instructed to complete questionnaires that evaluated their empathy (Paulus 2009), altruism (Windmann et al. 2021), attitudes toward AI (Sindermann et al. 2021), and aesthetic responsiveness (Schlotz et al. 2021) Additionally, a second altruism questionnaire (Rushton et al. 1981) was included in the study, although its findings were not intended to be included in the current article (as preregistered). Finally, participants were given the opportunity to provide their speculations regarding the purpose of the study before being provided with a detailed debriefing and being thanked for their participation.

3.4 Independent measures

3.4.1 The Saarbrueck Personality Questionnaire on Empathy

The validated questionnaire utilized in the study is a German adaptation of the Interpersonality Reactivity Index (Davis 1983). The original version of this self-report inventory comprises 28 items divided into four subscales. The adapted German version (Paulus 2009) consists of 16 items. Four items are dedicated to each facet, and each item is assessed on a 5-point Likert scale (“never,” “seldom,” “sometimes,” “often,” “always”). The subscales consist of the fantasy (FS) facet (e.g., “I am good at imagining the feelings of a person in a novel”), the personal distress (PD) facet (e.g., “In emergency situations, I feel anxious and uncomfortable”), the perspective-taking (PT) facet (e.g., “I believe there are two sides to every problem and therefore try to take both into account”), and the empathic concern (EC) facet (e.g., “I have warm feelings for people who are less well off than I am”) (Cronin 2018; Paulus 2009). EC and PT were of interest for the current study. With Cronbach’s Alpha (Cronbach 1951) values of αEC = 0.73 and αPT = 0.72, both subscales demonstrated fairly high internal consistencies (Taber 2018).

3.4.2 Facets of Altruistic Behaviors (FAB) scale

The FAB scale (Windmann et al. 2021) is a 15-item self-report measure for assessing three distinct facets of altruistic behavior traits, each represented by five items and assessed on a 5-point Likert scale (“fully disagree,” “disagree,” “undecided,” “agree,” “fully agree”). Specifically, help-giving (HG; e.g., “In a conflict, I prefer to turn to the weak than to the strong”), moral courage (MC; e.g., “It has already happened that I have offended people because of my moral convictions”), and peer punishment (PP; e.g., “If someone intentionally takes advantage of the community, I discreetly reciprocate in some way”) were measured. The facets of interest—HG and MC—both yielded high internal consistencies (αHG = 0.81; αMC = 0.79).

3.4.3 Attitude Towards Artificial Intelligence (ATAI) scale

The validated ATAI scale (Sindermann et al. 2021) consists of five items, which represent statements with which participants rate their agreement on a 5-point Likert scale (“fully disagree,” “disagree,” “undecided,” “agree,” “fully agree”). Three items on the questionnaire are designed to measure fear or apprehension toward AI (e.g., “I fear artificial intelligence”), whereas the remaining two items measure acceptance of AI (e.g., “Artificial intelligence will benefit humankind”). When reversed, these items can be combined, forming a single dimension measuring attitudes toward AI. In our sample, the scale yielded high internal consistency (α = 0.83).

3.4.4 The Aesthetic Responsiveness Assessment (AReA) scale

The Aesthetic Responsiveness Assessment (Schlotz et al. 2021) is a validated screening tool consisting of 14 items, assessed on a 5-point Likert scale (“never,” “seldom,” “sometimes,” “often,” “very often”) used to evaluate individual differences in aesthetic responsiveness (e.g., “I notice beauty when I look at art”). It is designed to assess how individuals perceive and respond to aesthetic stimuli (e.g., art or designs). The internal consistency was α = 0.91. This variable was not included in the hypothesis analysis. It was used only to confirm that our groups did not differ in their general responsiveness to art.

3.5 Analysis approach

Initially, we computed person indices for all variables by calculating the row sums for all items measuring one variable and dividing it by the number of items answered per variable (i.e., mean). Specifically, these variables were purchasing intentions, aesthetic value, AI-attitude (Sindermann et al. 2021), aesthetic responsiveness (Schlotz et al. 2021), the two empathy subscales in question (Paulus 2009), and the two altruistic behavior facets in question (Windmann et al. 2021). We first ensured that our groups did not differ in aesthetic responsiveness via an ANOVA, F(1, 949) = 0.59, p = 0.441. Then, we made sure that the different sets of pictures received similar evaluations, when labelled the same way. Set 1 and set 2 should not differ in purchase intention and aesthetics evaluations. As hoped, when labelled with the same label the two sets did not differ from each other significantly in the experimental conditions: for purchase intention p = 0.200 and p = 0.552 and for aesthetic value p = 0.065 and p = 0.482.

For Hypotheses 1a–d, we conducted a series of t tests. For Hypotheses 1b and 1d, we compared the control conditions (the means across all 20 images). For Hypotheses 1a and 1c, we compared the experimental conditions. However, given the fact that our experimental conditions rated ten images labeled as AI-generated and ten labeled as human-created, we conducted two t tests for each of the hypotheses, i.e., set 1 from the first experimental group (Ex-Human) vs. set 2 from the second experimental group (Ex-AI) and the other way around. For Hypotheses 2–4, we computed two repeated-measures (multilevel) regression models (M. Kim et al. 2020) with the set of pictures depicting two measurement points, the participants forming the grouping variable (random effect) and purchasing intentions or aesthetic value as the dependent variable. After z-standardizing all independent variables and the response variable, we used restricted maximum-likelihood estimation and the pseudo R2-statistics for the individual models calculated on the basis of (Nakagawa and Schielzeth 2013).

The model assumptions for all models were assessed after an initial calibration. We detected outliers in the model for purchasing intentions, based on a composite outlier score (Lüdecke et al. 2021) obtained by the joint application of multiple multivariate outlier detection methods, namely, z-scores (Iglewicz and Hoaglin 1997), Mahalanobis distance (Cabana et al. 2021), Robust Mahalanobis distance (Gnanadesikan and Kettenring 1972), Minimum Covariance Determinant (Leys et al. 2018), Invariant Coordinate Selection (Archimbaud et al. 2018), and Local Outlier Factor (Breunig et al. 2000). We excluded n = 7 participants who were classified as outliers by at least half (i.e., 3) of the methods used. We applied the same method to detect outliers for the aesthetic value model, detecting n = 6 participants. These participants were removed from the respective models, which were then re-estimated.

We further detected potential multicollinearity in both models (i.e., purchasing intentions and aesthetic value). We inspected the correlations between the predictors and found that the empathy and altruism predictors were positively correlated (range: r = 0.21—0.54; rMean = 0.39), indicating potential multicollinearity (see Appendix). Multicollinearity diminishes the precision of the estimated coefficients, consequently diminishing the power of the estimate itself. Such a reduction in power would imply a reduced ability to identify significant effects as truly significant, as it becomes challenging to isolate the individual independent variables' respective impacts from one another (Voss 2005). However, typically becomes a problem once the predictor correlations significantly surpass r = 0.50 (Vatcheva et al. 2016), with some arguing a cut-off at r = 0.80 (Berry and Feldman 1985). Nevertheless, recognizing that multicollinearity may diminish the statistical significance of our findings (reducing the probability of rejecting the null hypothesis that a predictor is non-significant), we made the deliberate choice to present these results, which can be considered more conservative in nature.

Moreover, we compared the random intercept models with models with both random intercepts and slopes. However, model comparison did not show a significantly better fit of the more sophisticated models (purchasing intentions: χ2 = 1.83, p = 0.400; aesthetic value: χ2 = 0.17, p = 0.921). Thus, we report only the results of the random intercept models for purchasing intentions and aesthetic value. Raw data and codebook are available here: https://doi.org/10.17605/OSF.IO/TS4V3. Data were analyzed using RStudio, version 2023.09.0 (Posit team 2023).

4 Results

To reiterate, images labeled as created by a human artist among the set of images labeled as AI-generated are referred to as Ex-Human (set 1 in Experimental group A and set 2 in Experimental group B). Conversely, images labeled as AI-generated among the set of images labeled as created by a human artist are referred to as Ex-AI (set 2 in Experimental group A and set 1 in Experimental group B). In the first control group, participants evaluated only AI-labeled images, identified as C-AI. In the second control group, participants assessed only artist-made images, designated as C-Human.

4.1 Hypothesis 1a

The first t test indicated that the experimental conditions (Ex-Human, M = 3.31, SD = 1.18; Ex-AI, M = 3.09, SD = 1.22) differed significantly in purchasing intentions, |t|(486) = 2.00, p = 0.046, |d|= 0.19, 95% CI [-0.01; 0.36]. The second t test also indicated that the experimental conditions (Ex-Human, M = 3.45, SD = 1.32; Ex-AI, M = 3.15, SD = 1.21) differed significantly in purchasing intentions, |t|(486) = 2.59, p = 0.010, |d|= 0.23, 95% CI [– 0.41; – 0.06]. The effect sizes indicated a small effect (Cohen 1988). These results support H1a.

4.2 Hypothesis 1b

The t test indicated that the control conditions (C-Human, M = 3.17, SD = 1.14; C-AI, M = 3.07, SD = 1.18) were not significantly different in purchasing intentions, |t|(462) = 0.91, p = 0.363, |d|= 0.08, 95% CI [– 0.10; 0.27]. These results support H1b.

4.3 Hypothesis 1c

The first t test indicated that the experimental conditions (Ex-Human, M = 4.18, SD = 1.12; Ex-AI, M = 4.01, SD = 1.03) showed marginally significantly differences in aesthetic value, |t|(486) = 1.75, p = 0.080, |d|= 0.17, 95% CI [-0.02; 0.34]. The second t test also indicated that the experimental conditions (Ex-Human, M = 4.37, SD = 1.11; Ex-AI, M = 4.08, SD = 1.17) were significantly different in aesthetic value, |t|(486) = 2.82, p = 0.005, |d|= 0.26, 95% CI [-0.43; -0.08]. The effect sizes indicated a small effect (Cohen 1988). These results support H1c.

4.4 Hypothesis 1d

The t test indicated that the control conditions (C-Human, M = 3.98, SD = 1.19; C-AI, M = 3.91, SD = 1.20) were not significantly different in aesthetic value, |t|(462) = 0.70, p = 0.487, |d|= 0.06, 95% CI [– 0.12; 0.25]. These results support H1d. See Fig. 2 for plots of all tests.

Fig. 2
figure 2

Violin plots of the t tests. The red dots represent the mean and the lines spread out two SDs. A & B = Hypothesis 1a; C & D = Hypothesis 1c; E = Hypothesis 1b; F = Hypothesis 1d

4.5 Ad hoc tests

Interestingly, an ad hoc analysis showed that there were no differences between C-AI and Ex-AI in purchasing intentions (set 1: |t|(497) = 0.81, p = 0.416; set 2:: |t|(483) = 0.09, p = 0.932) or aesthetic value (set 1: |t|(497) = 1.66, p = 0.101; set 2: |t|(483) = 0.96, p = 0.336). By contrast, ad hoc analyses revealed that the Ex-Human labels were rated significantly better than the C-Human labels across both sets and with respect to both purchasing intentions (set 1: |t|(483) = 2.84, p = 0.005; set 2:: |t|(497) = 2.66, p = 0.008) and aesthetic value (set 1: |t|(483) = 3.20, p = 0.001; set 2: |t|(497) = 3.72, p < 0.001). Thus, Ex-AI art was not devalued compared with C-AI art but remained about the same, whereas the Ex-Human-labeled art was upvalued compared with the C-Human-labeled images.

4.6 Hypotheses 2a-d and 4a

On the basis of the above results, we wanted to determine whether differences in purchasing intentions between the labeling in the two experimental groups could be partially explained by our chosen psychological variables. To assess the effects of the empathy variables empathic concern (EC) and perspective taking (PT), the altruism variables moral courage (MC) and help-giving (HG), and attitudes toward AI on purchasing intentions, we used a (repeated-measures) multilevel regression model. Table 2 presents the results.

Table 2 Results for the multilevel regression predicting purchasing intentions

Our findings indicate significant positive main effects of labeling, attitudes toward AI, and HG and marginally significant main effects of EC and MC, suggesting a positive influence of these variables on overall purchasing intentions. In terms of labeling, when an image was labeled human-made, it had significantly higher purchasing intentions than when it was labeled AI-generated. Additionally, we observed a significant interaction between the artist label and PT, which was also the only significant interaction we found. Specifically, when the label indicated that an image was created by a human artist, PT had a more positive impact on purchasing intentions than when the label indicated that the image was generated by AI. However, the main effect of PT itself was not statistically significant. This finding can be explained by the significant interaction in which PT had a negative effect on purchasing intentions for AI-labeled images but a positive effect for human-labeled images, resulting in an overall neutral effect when the label was disregarded. Taken together, we found support only for H2b but not for H2a, H2c, H2d, or H4a.

4.7 Hypotheses 3a-d and 4b

On the basis of the above results, we wanted to determine whether the differences we found in perceived aesthetic value between the labeling in the two experimental groups could be partially explained by the psychological variables we tested. To assess the effects of the empathy variables EC and PT, the altruism variables MC and HG, and attitudes toward AI on perceived aesthetic value, we used a (repeated-measures) multilevel regression model. Table 3 presents the results.

Table 3 Results for the multilevel regression predicting aesthetic value

The above findings indicate significant positive main effects of attitudes toward AI, labeling, HG, and PT and a marginally significant positive main effect of MC, suggesting positive influences of these variables on overall perceived aesthetic value. Additionally, we observed a (marginally) significant interaction between the label indicating the creator of the art and attitudes toward AI. Specifically, when the label indicated that an image was created by a human artist, attitudes toward AI had a less positive impact on perceived aesthetic value, compared with when the label indicated that the image was generated by AI. Nevertheless, the general effect remained positive, indicating that even for human-labeled images, a positive effect of attitudes toward AI was observed in both models. All other interactions were non-significant. Taken together, we found support only for H4b but not for H3a, H3b, H3c, or H3d.

5 Discussion

5.1 Research question 1

Inconclusive findings on the existence of a negative bias toward AI-generated art, in terms of purchasing intentions and aesthetic judgment, prompted us to take a closer look at these discrepancies (RQ1). Some studies found a negative bias toward AI-generated art (Bellaiche et al. 2023; Chiarella et al. 2022; Millet et al. 2023), whereas others did not (Hong and Curran 2019; Israfilzade 2020; Xu et al. 2020). Nevertheless, these studies often shared a common line of reasoning that led them to expect a negative bias toward AI-generated art. The literature on human perception of art has identified that perceived intentions (Jucker et al. 2014), meaning (Graf and Landwehr 2015), evoked emotion (Freedberg and Gallese 2007), and effort (Kruger et al. 2004) exert a considerable influence on the value that humans assign to specific works of art. The reasonable assumption of many previous studies has been that AI-generated art cannot satisfy these factors as well as human art can and that the unfavorable valuation of AI-generated art may stem from this divergence (Bellaiche et al. 2023; Chiarella et al. 2022; Darda and Cross 2023). However, if one were to adhere to the above reasoning, the divergent outcomes in various studies might appear puzzling. Nevertheless, upon scrutinizing the methodologies employed in these studies, a crucial distinction emerges. It seems like the negative bias toward AI art emerges when AI-generated art and human-created art are appraised concurrently, but this bias does not materialize when separate groups of participants evaluate each type of art exclusively (see Table 1). And indeed, our results point in that direction. If digital artwork labeled as human-created (Ex-Human) and AI-labeled (Ex-AI) images are presented in a random mixed order (competition condition), the human-labeled images receive higher ratings on purchasing intentions and aesthetic value. However, if the theoretical assumptions that the lack of intention, meaning, emotion, and effort actually translate into lower aesthetic valuation and purchasing intentions, a difference should also be detected between the two control groups. Moreover, the human-labeled control group and the human-labeled treatment group (i.e., Ex-Human) should not differ. However, it turned out that the negative bias toward AI did not manifest in our control groups, each of which had assessed either human-labeled or AI-labeled art exclusively. But indeed, when contrasting the two human-labeled groups (i.e., human-labelled with and without AI competition), a significant difference appeared. Essentially, this finding means that the AI art had not been devalued; rather, only the human-made art in the competition condition had been upvalued. Therefore, instead of talking about a negative bias toward AI, it might be more appropriate to talk about a positive bias toward humans in the presence of AI.

To conclude, in our study, we replicated both the existence and the non-existence of a negative bias toward AI, or more precisely a positive bias toward humans, and we explained why previous findings have conflicted. Essentially, we demonstrated that the different results from previous studies might not have come about by chance but that they can be justified methodologically and theoretically. Importantly, we demonstrated that by creating a competition condition by concurrently displaying AI-generated and human-created art, variations resulted in the valuation of purchasing intentions and aesthetic judgment. Millet et al. (2023) emphasize that AI elicits negative reactions, because it shakes people’s deeply rooted anthropocentric worldviews, which in turn lead to the lower valuation of AI-generated art. Conversely, one could also say that human-centered worldviews lead to a magnification of the perceived value of human art.

5.2 Research question 2

Our second research question asked whether it is possible to differentiate between the variances of our two experimental groups using variables linked to both artistic perceptions and perceptions of competition. For this purpose, we ran multilevel regression models. On a general level, the results indicate that, independent of the assumed creator of the piece, the intention to purchase a piece of art depended significantly on the two altruism variables (MC and HG) and the empathy variable EC. In terms of the perceived aesthetic value of the digital art pieces, the model also resulted in main effects of HG, MC, and attitudes toward AI but not EC. However, PT was significant instead. These findings are in line with a recent study that showed that a prosocial (altruistic) personality predicted personal engagement in art across a 2-year period (van de Vyver and Abrams 2018). Another study was able to show that prosocial behaviors (e.g., help-giving, donating, and volunteering) were positively associated with general art consumption (Kou et al. 2020). It has also been argued that the subjectively perceived aesthetic value of a piece of art depends on a person’s empathic ability (Crozier and Greenhalgh 1992), specifically their PT (Miller and Hübner 2023). Furthermore, both models demonstrated that the label had a significant main effect on purchasing intentions and aesthetic value. These results are essentially what was expected after our t test analyses found differences in the competition condition.

Regarding the score disparities between AI- and human-labeled images in the simultaneous presentation in relation to altruism and empathy, the findings were mixed. For purchasing intentions, we found that empathy in the form of PT interacted with the label, however without a significant main effect of PT. Upon closer inspection, the influence of PT was negative for AI-labeled images and positive for human-labeled images. Consequently, this difference indicates that AI-generated art is subject to devaluation, whereas the value of human-created images is enhanced for participants with strong PT. This outcome stands in contrast to the findings of Bellaiche et al. (2023), who found no interaction between empathy and the label. Nonetheless, as described in the introduction, this incongruity can potentially be attributed to their approach of defining empathy as a one-dimensional construct. Further, in consideration of the outcomes for Research Question 1, it seems plausible that PT prompted individuals with a higher expression of this empathy facet to perceive the competition between human and AI art, and their ratings were subsequently influenced by this competition. Interestingly, regarding the aesthetic value of the artwork, there was no significant interaction between the capacity for PT and the label attributed to the artwork. Consequently, it appears that, even though empathy enhances individuals' willingness to purchase artwork created by humans and to support artists financially, it does not influence the aesthetic value of the art itself.

Furthermore, the interactions of EC, MC, and HG with the type of label were also not significant in the aesthetic value model. This lack of significance indicates that these variables did not contribute to explaining the observed differences and that altruism, at least as we operationalized it, did not significantly contribute to explaining the differences at all – neither in the purchasing intentions nor in the aesthetic value model. Regarding MC, this could have arisen from the fact that the concept is characterized as acting in accordance with one's personal convictions, but it also includes the anticipation of potential punishment or reprisal (Pianalto 2012). In our study, this second aspect was not very pronounced. Furthermore, hypothetical purchasing decisions and especially aesthetic evaluation were only indirect measures of personal willingness to actively help other people in need. This indirectness might further explain the lack of interaction between the creator-label and HG. In addition, participants never had to choose between the images when indicating their purchasing intentions; instead, all images were evaluated serially. The images were thus in only a slight but not direct competition. If participants had been forced to choose a picture (see Millet et al. 2023), an effect might have been conceivable. Future studies could investigate this aspect. Additionally, the lack of significance of the interaction between EC and the type of label was very surprising to us. It is possible that participants did not yet consider the artists’ situations threatening enough. Similarly, to the aforementioned points, it could be posited that a more pronounced competition might have yielded different outcomes. Nevertheless, it is essential to underscore that excessively forceful manipulation could likely have generated an unrealistic scenario, thereby compromising the external validity of the experiment. Finally, another explanation why empathy variables did not consistently interact with the creator-label could be that individuals with a high level of empathy can potentially experience empathy not only towards other humans, but also towards AI-agents under certain conditions (e.g., chatbots; W. B. Kim and Hur 2023). Whether this was the case in this study could be tested in future studies.

5.3 Research question 3

The third research question addressed whether participants’ personal attitudes toward AI could explain the differences between experimental groups. On a general level, attitudes toward AI had a significant influence on purchasing intentions and aesthetic value. This finding means that attitudes toward AI had an impact on both human-made and AI-created images. Interestingly, however, this effect was opposite the effect for PT in that in the model for aesthetic value, we found an interaction but not in the model for purchasing intentions. In consideration of the positive main effect of attitudes toward AI on the aesthetic judgment, the following inference can be drawn: When participants believed that the image was generated by AI, the positive effect of their AI-attitude was amplified; however, when they perceived the image to be created by a human, the influence was comparatively weaker but remained positive. This finding initially surprised us. Yet, if one comprehends attitudes toward AI as an integral facet of a broader affinity for technology, as proposed by certain scholars (Henrich et al. 2022), and contemplates that the art we presented was exclusively digital, this perspective could potentially explain these results. Further, the results are in line with the study by Bellaiche et al. (2023), where participants rated the worth of an art piece slightly higher when the label was AI and the person’s attitude toward AI was stronger.

5.4 Limitations and future research

As true for every study, our study comes with certain limitations and potential for future research endeavors. In our study, all the images we used were actually AI-generated images. It would have been wise to replicate the study using only traditional human-made art pieces.Footnote 1 Furthermore, we did not distinguish between different styles of art (e.g., representational, abstract). However, since we randomized the label attributions, we ensured that every participant saw different styles. Nevertheless, specifically when it comes to figurative images, it could be conceivable, that some participants are indeed able to identify if an image is AI-created or not in general and did not fall for our labelling. Research shows that individuals struggle to distinguish between AI and human art (Samo and Highhouse 2023), but we cannot rule out that this is true for all participants, especially those who are more familiar with digital art and art in general.

Additionally, we assessed purchase intention without actually setting a price for the art pieces. However, we were more interested in the appreciation our participants showed towards the art pieces (and thereby detecting a bias), rather than including a price, which would probably influence the participants’ responses. Nevertheless, this could be an interesting future study, to investigate, if the bias we found depends on the price of an art piece and if it is stable for all prices or even changes with the price level. Further, we did not find any differences between the C-AI and the Ex-AI condition. It must be stressed that an absence of an effect does not directly imply that it cannot be present, as a equivalence analysis would need to be conducted (Lakens et al. 2018). However, since this effect was consistently not found for both sets, we see this as a strong indication. Additionally, when assessing purchase intentions, participants may contemplate who benefits financially from the hypothetical purchase. For art labeled as human-made, they might consider supporting the actual artist. In contrast, for AI-generated labeled art, it is less apparent who would receive payment—whether it is the operator of the AI, the AI tool manufacturer, or perhaps all artists represented in the dataset receiving a share. This ambiguity could potentially lead to a higher purchase intention for human-made labeled art. However, if this were a significant issue, we would expect to see differences in our control groups, which we did not observe.

Further, our study was conducted in Germany, a Western country. Interestingly, Wu et al. (2020) found a positive bias toward human art in the US but not in China. It is possible that culturally determined differences in perceptions of AI can be found, and these differences may also reinforce a positive bias or make it disappear altogether. Germany in particular is a thoroughly technology- and AI-skeptical country (Gondlach and Regneri 2023). Therefore, a replication of this study in countries with different conditions and circumstances seems inevitable for broader generalization. Moreover, although we constructed an implicit competition scenario, and the outcomes imply that perceptions of competition contributed in part to the positive bias toward human art, we did not inquire about perceptions of this competition directly. Future studies should directly ask participants if they perceived a threat to human artists. There is a distinct possibility that the positive bias toward humans in the realm of arts is transitory and that individuals will acclimate to the presence of AI. Conducting a subsequent replication would be advantageous for investigating the durability of these findings, and incidentally, this point is relevant to all studies referenced in this article, since algorithmic systems are already and are becoming more and more prevalent in everyday life in various forms (Zabel and Otto 2024).

5.5 Conclusion

In our study, we sought to get to the bottom of diverging research on a negative bias toward AI-generated art with respect to perceived monetary and aesthetic value. Furthermore, we tried to identify interindividual differences that could explain this bias. To the best of our knowledge, there are only two other studies on the latter (Bellaiche et al. 2023; Millet et al. 2023). Regarding the former aspect, we successfully demonstrated that the lower evaluation of AI-generated art arises only when a subtle scenario involving competition is introduced through a direct comparison of AI-generated and human-created art, as indicated by the absence of such differences in our control groups, along with the noteworthy distinction observed between the two groups labeled as being created by human artists. Thus, there was no devaluation of AI-generated art, and it is more accurate to speak of a positive bias toward humans—at least under the conditions and circumstances presented in our study. It should be emphasized that we did not actively create a competition scenario but that this competition came about only through the direct juxtaposition of the art forms. If we had additionally increased the salience of the competition, the differences in the competition condition may have been even greater. Nonetheless, it would be unwise to presume that the positive bias toward human-created art in terms of purchasing intentions and aesthetic judgement induced by a perception of competition will remain static. Over time, it is plausible that this perception might diminish and the co-existence of the two types of art will become the norm. Indeed, parallels could be drawn with the debate over photography at the end of the nineteenth century (Stieglitz 1892). Nowadays, photography has long been established as a form of art. AI-generated art may achieve a similar evolution. It is conceivable that this trend has already begun, that individuals with a penchant for art are exhibiting an apparent fascination with AI-generated art, and that the societal debate around AI-generated art is becoming less negative (Bran et al. 2023).

Finally, we were able to show that PT, one of the core facets of human empathy (Davis 1996), was partly responsible for the difference we found in the competition condition with respect to purchasing intentions. People with higher empathy valorized human-made images in the competition condition. Furthermore, attitudes toward AI had a significantly different effect, with respect to aesthetic judgment, on images with an AI label than on those supposedly created by a human artist. Here, higher attitudes toward AI led to better ratings of AI-generated art, a process that went opposite the one for PT. Hence, it seems that the ability to adopt a certain perspective has an effect only when it comes to the actual remuneration of the artists, whereas attitudes toward AI have a greater positive influence on the evaluation of the art only when referring to aesthetics. The anticipated effects were not observed in the context of altruism.

5.6 Constraints of generality

This paragraph serves to explicitly define the target population of the present study (Simons et al. 2017). We conducted the research in Germany with a representative sample. This means that the results may be generalized to the broader German public and potentially to other contexts and cultures similar to Germany. However, cultures and societies diverging significantly from the German one (e.g., with more or less AI-affinity) might report different results. Further, we tested the general public and did not test art and/or AI-enthusiasts specifically. People with a general interest in art and/or AI might also differ in their results.