Keywords

1 Introduction

Artists in the contemporary market for fine art must make many decisions on how to present their work for commercial sale. While these choices are crucial to the artist’s long term commercial success, artists are given limited guidance on how to present their work online to its best advantage. Can the price that an artist can ask for regarding a work be altered by changing minor aspects that their potential customers see? What drives the price for a work of fine art?

The challenge of objective valuation is worsened by the complex nature of pricing in the art market. There are a number of theories on how pricing in the art market works. As said in [35],

In a market that has to reconcile a fierce opposition between commercial and artistic values, and that has to commodify goods whose essence is considered to be non-commodifiable, dealers, artists, and collectors find ways to express and share non-economic values through the economic medium of pricing.

This work proposes a method for developing a set of action rules that can be used to suggest courses of action to artists that will improve the sales potential of their work. It begins by discussing the development of a dataset and feature construction for the generation of action rules, moves on to contrast the results of different sets of features for the creation of these rules, and it concludes by discussing the future potential of this research avenue. These rules reclassify artworks from one price level to another higher price level. This work discusses possible sets of flexible attributes and proposes a basic set of stable attributes [24]. Stable attributes are generally aspects of the relevant item that are unlikely or difficult to change. In the context of this work, these represent the physical aspects of the artwork and fundamental qualities such as it’s medium. Flexible attributes are features that can be altered by an interested party to change its classification. In the context of this work, these represent aspects of how the artist presents themselves to potential customers.

2 Related Work

Applying computing techniques to artworks has received considerable attention from the academic community. Some researchers have addressed the challenge of creating a recommender system that picks out artworks similar to a given one for a user [29], while others are focused on automatically tagging a painting with emotions based on the colors represented [15].

Art price valuation has received limited attention. One interesting project in the realm of price prediction was Hosny’s work [13]. As discussed in [7], one notable trend explored in Hosny’s work was that certain colors, specifically blacks, whites and grays, are more likely to have a higher sales valuation. Different approaches have been used for price prediction, such as the approach used by [18] to value Rothko paintings. In [11], hedonic regression, a statistical modelling method based on characteristics of the object in question, combined with past sales was used to predict art value.

Action rules were first proposed in [24] as a method of reclassifying objects from one group to another group by changing values of their flexible attributes. They have been used for medical data, such as in [12] and for business purposes, as in [24]. Some have explored expansions on the original methodology, such as considering the cost and feasibility of the implementation of action rules as discussed in [33, 34]. The cost of an action rule, which was initially proposed in [34], represents the average cost of changing an attribute from its initial value to another specific value.

This work uses LISp-Miner’s Ac4ft-Miner tool, introduced in [25], discussed in [19], and developed by [26]. This is an implementation of the GUHA method for action rule generation [25]. In this method, objects resembling traditional association rules are extracted from the dataset and used to form contingency tables [25]. Two of these objects, with matching stable attributes, are used to construct the action rules. One rule, termed here as the ‘before rule’, has the stable and flexible attributes of the object before any actions are taken and the other rule, called the ‘after rule’ here, has the desired set of flexible attributes and matching stable ones. Rules generated in this manner consist of an antecedent and succedent, which are association rules in the form \(\varphi \approx \psi \) where the symbol \(\approx \) represents the association between \(\varphi \) and \(\psi \) [25]. These are then used to create action rules, called G-Action Rules, through the examination of dependencies and similarities across the rules generated in the previous step [25].

3 Dataset

The dataset used in this work is an expanded version of the one used in [21,22,23]. Our work expands on the ideas discussed in those works by adding action rules to the original dataset and proposed feature sets.

We use a dataset of artworks and artist information collected from the online art sales site Artfinder.com [5]. Artfinder represents artists from all over the world, and has diverse styles and subjects represented. It functions as a platform to allow artists to sell directly to consumers. Using a single source for artworks allows for more consistent definitions of tags used for artworks. The information was scraped using Beautiful Soup [2] to parse the pages, Apache Selenium [4] to work with dynamic webpages and Javascript, and Python to fetch pages and all other major steps. It contains approximately 200,000 artworks from approximately 3,300 artists.

All artworks posted on Artfinder have a dedicated page describing the work. This page consistently has at least one photograph of the work, and has its descriptive information, as well as the tags that the artist selected. These tags are used by potential customers to search for works. These tags are simple and specify the features such as the artistic style, subject or medium of the artwork. Some of the analysis methods discussed below use the primary image of the artwork from the product page. A small number of artwork images could not be retrieved, so they are omitted from the following analysis.

Artists on Artfinder create profiles of themselves discussing their achievements and backgrounds. The profiles have places for artists to provide their biographies, links to their social media and descriptions of achievements in the artist’s life. A number of artists have reviews posted from customers, and are rated on a star system with five stars being the best. The majority of artists with one or more reviews have ratings of four out of five stars or higher, and very few have any low reviews. The average of all artists with star ratings score is 4.904. This makes the star ratings received by artists of very low value as predictive features.

4 Methodology

The decision feature considered here, which represents the succedent of the G-Action Rule, is the listed asking price of the artwork. Artists would attempt to make changes to how their artworks are presented to the public in order to increase their sales price, and should be cautious about making changes that could potentially lower their sales price. The prices were discretized into ten groups using LISp-Miner [26]. The cuts were placed to automatically to create partitions containing an approximately equal number of tuples. The price attribute was partitioned into 10 levels, referred to in later sections as levels one through ten. In this work, price is treated as a flexible attribute.

This work defines a set of stable attributes, which are used in the antecedent portion of the G-Action Rule, that can be used to describe a work of art. The set of features used takes inspiration from the sets used in [20]. The chosen medium of the artwork is used as a stable feature. When the artwork was posted the artist tagged it with a medium for search purposes. In order to keep the number of options within reasonable bounds, the mediums were discretized into seven categories and one “NA” category. The artistic style of the work as well as the artist’s subject is similarly used as a tag and as a stable attribute. As discussed in Pawlowski [20], the size of a work is very relevant to determining its price. So the length and width of the works were discretized into five bins of roughly equal size using the LISp-Miner [26] discretization tool and used as stable attributes. Additionally, the presence or absence of visible reviews was used as a stable binary feature. The review scores of the artists are consistently very high if they are present at all. This limits the utility of using the score as a quality metric. Instead, the presence or absence of reviews was used as a stable attribute. Lastly, the percentage of edges in the work was utilized as a stable attribute after being discretized. To calculate this, Canny edge detection, which was developed by John Canny in [10], was used from the OpenCV set of tools [32] on the primary artwork image. A small percentage of artworks had errors when an attempt was made to retrieve the photograph, so those tuples were removed from consideration. This algorithm determines if an individual pixel represents an edge or not based on the amount of color variation surrounding it. The number of edge pixels and non-edge pixels are then counted, and the number of edge pixels was used as a stable feature. This can be considered as giving a rough idea of the amount of detail or amount of color variation in the work. The full list of stable attributes is given below.

  • Artistic Style

  • Artistic Subject

  • Medium

  • Height

  • Width

  • Artist Has Visible Reviews

  • Percentage of the Artwork Representing Edges

Lastly, the use of color in the artwork was explored as an additional stable feature in one of the rule sets. The pixels in the artwork were clustered using Open-CV [32] across the RGB dimensions using K-Means to determine the 10 most frequently appearing colors. Out of this set of ten, the centroid of the largest cluster was tested against 11 reference colors using the CIEDE2000 color difference measure [30] which was implemented using [31]. The reference colors are based on the idea of universal basic colors which was first proposed in [8]. While this work has been challenged and the colors proposed have been refined such as in [17, 28], the concept of basic colors is useful for classification. The 11 reference colors are white (R 255, G 255, B 255), gray (R 128, G 128, B 128), black (R 0, G 0, B 0), red (R 255, G 0, B 0), orange (R 255, G 128, B), yellow (R 255, G 255, B 0), green (R 0, G 255, B 0), blue (R 0, G 0, B 255), purple (R 128, G 0, B 128), pink (R 255, G 192.0, B 203), brown (R 63.8, G 47.9, B 31.9). The RGB values, other than brown, are taken from [3] and the brown comes from [16]. The selected color, referred to as main color, is used as a stable attribute representing the single color that takes up the largest portion of the work.

The set of flexible attributes, part of the antecedent of the rules, explored center around how a work will be perceived by a consumer. How long is the artist’s biography? Do they have a presence on social media? What is the tone of their writing? These are easily changeable for an artist hoping to improve their sales. This gives them a very low cost, so stakeholders may be more willing to try recommended rules. The full list of flexible features used is given below.

  • Word Count of the Artist Biography (Bio. WC)

  • Word Count of the Artwork Description (Desc. WC)

  • Social Media (Considered Together as SM)

    • − Artist Listed a Facebook Profile

    • − Artist Listed a Twitter Profile

    • − Artist Listed an Instagram Profile

  • Positive Sentiment Level of Biography (Bio. Ps.)

  • Negative Sentiment Level of Biography (Bio. Ng.)

  • Positive Sentiment Level of Artwork Description (Desc. Ps.)

  • Negative Sentiment Level of Artwork Description (Desc. Ng.)

The number of words in the biography of the artist and in the artwork description are used as predictive features. As was discussed in [21,22,23], the word count does have some utility as a price predicting feature. In other arenas of online sales, the length and wording of a description can have a bearing on the sale ability of an item. In [27], the authors analyzed sales on Ebay, and determined that the length of the description could have an impact on the price.

Similarly, many collectors find artists through social media [1]. It has become an increasingly important tool for helping artists be discovered by collectors. Can adding or removing a link to a profile change the opinion of a collector? Determining the sentiment of the text was done using VADER, the ‘Valence Aware Dictionary for sEntiment Reasoning’, which is part of the Python Natural Language Toolkit [9, 14]. The objective was to determine the polarity of the sentiment of the text. This strategy uses a set of words and phrases to determine the text’s sentiment, called an ‘opinion lexicon’ [6]. VADER uses a combination of an opinion lexicon and a set of rules [14]. The lexicon includes thousands of candidate terms rated by humans on scale, and the rule set considers the impact of capitalization and negation [14]. Each piece of text was given scores as positive, negative or neutral. These scores were discretized and used as flexible attributes. The sentiments found were largely neutral, with only a few pieces of text containing strongly emotional language.

Using the LISp-Miner Ac4ft-Miner tool, sets of rules were generated for combinations of prices and flexible attributes with a single set of stable attributes used throughout for consistency. As the goal is to move prices from lower to higher price points, rules were generated to transition items from lower price levels to higher price levels. Rules generated using this method have support and confidence scores for both the before state association rule and after state association rule that are used together to construct the final action rule.

5 Results

The first set of rules generated had no minimum confidence for a rule and had a requirement that the rule must apply to 50 or more tuples out of the dataset. Rules were generated for each flexible attribute alone, and the social media features alone and in combinations. This work compares the results of different flexible attributes alone to explore which have the greatest impact on the confidence and support of the resulting rules. Each set of rules contains an average of 1700 action rules. At least one rule applies to approximately 97% of the elements at that price level. This is termed the coverage of the rule set and can be used as a measure of its applicability. However, the confidence in these rules is quite low. As all the features discussed here have an extremely low cost, exploring low confidence rules is not a concern. To make the changes these rules suggest, an artist would only need to rewrite their artwork descriptions or change their profile. While some individual rules do have high confidence, the average confidence for the before attribute is approximately 13%.

In response to the low confidence of the initial sets of rules tested, additional rule sets were generated with greater constraints on the generation process. Rules were only generated to move each artwork up to the next higher price level. The social media flexible features (abbreviated in tables as SM) were used to generate rules, and the word counts (abbreviated in tables as WC) of both the artist’s biography (abbreviated as Bio.) and description (abbreviated as Desc.) and text polarity features (abbreviated as Ps. for positive sentiment and Ng. for negative sentiment) were all used separately. The minimum support level for a rule was lowered from 50 to 2, but the minimum confidence for a rule to be considered was 60% for the before and after rules. The minimum number of attributes for the stable portion of the rule is one and the maximum number is five. The rule set generated here had a considerably lower average coverage at 9.375%, but a considerably higher average confidence for the before rules at 79.16% and the after rules at 79.2%. Considerably fewer rules were generated, and the average number of rules per group was 1,155. Table 1 displays the coverage of each rule set for the selected price levels and flexible attribute. This represents the percentage of objects at the lower price level that had at least one applicable rule to raise that object to the selected higher price level.

Table 1. Coverage of rules generated using base stable features

The level of support varies dramatically depending on the price level being addressed. The exact values of the prices are as follows: (<12.97–58.28), (58.28–95.90), (95.90–130.04), (130.04–188.13), (188.13–250), (250–350), (350–490), (490–742), (742–1351.24), (1351.24–>1,000,000). For example, the coverage of the rules sets that move artworks from the lowest prices, level one, to the next lowest price level, level two, ranges between 19.24% and 38.89% across all the different sets of flexible attributes. Notably, the social media attributes had markedly worse performance than the others. The attributes that changed an artwork’s description had less consistently high coverage than the attributes that addressed an artist’s biography. At higher values, coverage decreases. This may be due to the size of the shifts necessary to move prices from one tier to another at the higher levels.

To expand on the set of stable attributes, another set of rules was generated that added the dominant color of the work to the list of stable attributes. This set was generated using a randomly selected subset with approximately 100,000 tuples. Lisp-Miner’s discretization tools were used to create a new set of partitions similar to those used previously for this set. The same restrictions on rule generation were repeated. Rules must have a minimum confidence of 60% for the before and after rules and rules must have a minimum support of 10. Slightly more rules were generated per set than in the previous variation with an average number of rules per group of 1551. This set did have a slightly higher average coverage of 15.30%, and a similar average before confidence of 78.81% and an after confidence of 79.07%. As with the previous set of rules, the coverage shifts across different price levels. The coverage of each rule group changes depending on the selected flexible attributes and the selected price levels as demonstrated in Table 2.

Table 2. Coverage of rules generated using base stable features and main color

6 Conclusions and Future Work

This work only begins to address the potential for the use of action rules to improve artist sales in the market for contemporary fine art. Many potential avenues for further research exist in the development of features for action rules.

In the initial set of results, the very high rate of coverage is promising. However, the low confidence level, while high for specific rules, is on average quite low. This issue may be attributable to allowing too many low confidence rules. It may also be due to the partitions of the feature sets being overly broad. However, low confidence for the set of G-Action rules is not a barrier to their being useful to an artist. These rules are extremely low cost and easily implemented. Stakeholder’s may want to try them on even the small chance that the rules will provide an improvement. The second and third set of rules discussed demonstrates that is possible to significantly raise the average confidence in the rules, but at the expense of lowering the average coverage of the rule sets.

One strong avenue for potential future research is the development of more features for rule generation. The development of more flexible attributes, as well as exploring the attributes discussed here in combination has potential value for stakeholders in the art market. In addition to expanding the list of artwork features, artists have characteristics that may serve as additional stable attributes. A greater exploration of larger scale career changes an artist could make has great potential for research.