1 Introduction

Artists in the contemporary market for fine art must make many decisions on how to present their work for commercial sale. While these choices are crucial to artists’ long term commercial success, they are given limited guidance on how to present their work online to its best advantage. Can the price that an artist asks, regarding a work, be altered by changing minor aspects that their potential customers see? What drives the price for a work of fine art?

The challenge of objective valuation is worsened by the complex nature of pricing in the art market. There are a number of theories on how pricing in the art market works. As said in Velthuis (2005),

In a market that has to reconcile a fierce opposition between commercial and artistic values, and that has to commodify goods whose essence is considered to be non-commodifiable, dealers, artists, and collectors find ways to express and share non-economic values through the economic medium of pricing.

This work proposes a method for developing a set of action rules that can be used to suggest changes for artists that will improve the sales potential of their work. Ultimately, these rules could contribute to the construction of a decision support system for artists. These rules reclassify artworks from one price level to another higher price level. This work discusses possible sets of flexible attributes and proposes a basic set of stable attributes (Ras and Wieczorkowska, 2000). Stable attributes are generally aspects of the relevant item that are unlikely or difficult to change. In the context of this work, these represent the physical aspects of the artwork and fundamental qualities such as its medium. Flexible attributes are features that can be altered by an interested party to change its classification. In the context of this work, these represent aspects of how the artist presents themselves and their work to potential customers. This work is an extension of Powell et al. (2020c), which was presented at ISMIS 2020. It adds greater context and more experiments to further develop the original work.

2 Related work

Applying computing techniques to artworks has received considerable attention from the academic community. Some researchers have addressed the challenge of creating a recommender system that chooses artworks similar to a given one for a user (Saleh and Elgammal, 2015), while others are focused on automatically tagging a painting with emotions based on the colors represented (Kang et al., 2018).

Art price valuation has received limited attention. One interesting project in the realm of price prediction was Hosny’s work (Hosny et al., 2014). As discussed in Bailey (2017), one notable trend explored in Hosny’s work was that certain colors, specifically blacks, whites, and grays, are more likely to have a higher sales valuation. Different approaches have been used for price prediction, such as the approach used by Liu and Woodham (2019) to value Rothko paintings. In Galbraith and Hodgson (2018), hedonic regression, a statistical modelling method based on characteristics of the object in question, combined with past sales was used to predict art value.

Action rules were first proposed in Ras and Wieczorkowska (2000) as a method of reclassifying objects from one group to another group by changing values of their flexible attributes. They have been used for medical data, such as in Tarnowska et al. (2017), Mardini and Raś (2019), and Hajja et al. (2014) and for business purposes, as in Ras and Wieczorkowska (2000) and Tarnowska et al. (2020), and social networks (Kalanat and Khanjari, 2020). Some have explored expansions on the original methodology, such as considering the cost and feasibility of the implementation of action rules as discussed in Tzacheva and Raś (2005) and Tzacheva et al. (2017). The cost of an action rule, which was initially proposed in Tzacheva and Raś (2005), represents the cost of changing flexible attribute values listed in the rule from their initial values to other specific values.

This work uses LISp-Miner’s Ac4ft-Miner tool, introduced in Rauch and Šimůnek (2009), discussed in Nekvapil (2009), and further developed in Rauch et al. (2019). The Ac4ft-Miner tool is an implementation of the GUHA method for action rule generation (Rauch and Šimůnek, 2009). In this method of action rule construction, objects resembling traditional association rules are extracted from the dataset and used to form contingency tables (Rauch & Šimůnek, 2009). Two of these objects, with matching stable attributes, are used to build the action rules. One rule, termed here as the “before rule”, has the stable and flexible attributes of the object before any actions are taken, and the other rule, called the “after rule” here, has the desired set of flexible attributes and matching stable ones. Rules generated in this manner consist of an antecedent and succedent, association rules in the form φψ where the symbol ≈ represents the association between φ and ψ (Rauch & Šimůnek, 2009). These associations are then used to create action rules, called G-Action Rules, through the examination of dependencies and similarities across the rules generated in the previous step (Rauch & Šimůnek, 2009). This strategy of action rules discovery is similar to the one presented in Tsay and Raś (2005).

As a hypothetical example, using the features addressed in this work, an association may be found between an artwork being in a realistic style with a very short description and this combination implies a very low price. Another association might be found between an artwork being in a realistic style with an average length description and this implies a considerably higher price. In this case, the style would be a stable attribute, and the length of the description would be a flexible attribute. As raising the price of an artwork is advantageous to the artist, we term the lower priced rule as the before rule and the higher price as the after rule. Assuming that both rules have sufficient support and confidence, it would be possible to list as a rule that if an artwork is in a realistic style if the length of the description changes from very low to average then the price will rise from one level to another.

3 Dataset

The dataset used in this work is an expanded version of the one used in Powell et al. (2019), Powell et al. (2020a), and Powell et al. (2020b). Our work expands on the ideas discussed in these works by adding action rules. The following section addresses the construction of the dataset as originally discussed in the preceding works.

We use a dataset of artworks and artist information collected from the website Artfinder.com (Artfinder.com, 2019). Artfinder represents artists from all over the world and has diverse styles and subjects available. Artfinder has paintings, prints, photographs, sculptures, and drawings posted for collectors. Using a single source for artworks allows for more consistent definitions of tags. The information was scraped using Beautiful Soup (Beautiful Soup, 2019) to parse the pages, Apache Selenium (Selenium, 2019) to work with dynamic webpages and Javascript, and Python to fetch pages and all other major steps. It contains approximately 200,000 artworks from approximately 3,300 artists.

Each artwork posted on Artfinder has a dedicated page describing the work. This page consistently has at least one photograph of the work and has its descriptive information, as well as the tags that the artist selected. These tags are used by potential customers to search for works. Some of the analysis methods discussed below use the primary image of the artwork from the product page. A small number of artwork images could not be retrieved, so they are omitted from the following analysis.

Artists on Artfinder create profiles of themselves discussing their achievements and backgrounds. The profiles have places for artists to provide their biographies, links to their social media, and descriptions of achievements in their lives. A number of artists have reviews posted from customers and are rated on a star system with five stars being the best. The majority of artists with one or more reviews have ratings of four out of five stars or higher and very few have any low reviews. The average score of all artists in the dataset with any star rating is 4.904. This makes the star ratings received by artists of very low value as predictive features.

After the initial scraping, the dataset was extended by processing both the artwork images and the associated text to form new sets of features. This processing is detailed in a later portion of this paper. Both the originally scraped data and the new features were stored as a collection of database tables. These tables were queried and selected attributes and tuples were converted into comma separated files for use in LISp-Miner using Python. For example, a set of selected features from three records have been reproduced in Table 1. These records were inserted into LISp-Miner, discretized, and used for rule generation. The set shown uses all the flexible features, the base set of stable features that were used each time, and three additional stable features of three prominent colors.

Table 1 Sample records

4 Methodology

This work focuses heavily on exploring the relative value of different sets of features. These sets are examined by testing combinations of features with varying conditions in LISp-Miner. LISp-Miner takes csv files as input. Each file was made using a subset of the available features and a portion of the available tuples. Different methods of partitioning the dataset were also explored. Images and unstructured text required processing before they could be used as input for LISp-Miner. The forms of processing used, such as reducing the image to a group of colors or the text to a simple numeric sentiment score, is further discussed below. For each combination, LISp-Miner discretized the data and output the G-Action rules. In order to explore the impact of feature selection, a variety of thresholds for minimum support and confidence were used in rule construction. Lastly, the rule sets were examined to assess their coverage of the input tuples and their average level of confidence.

The decision feature considered here, which represents the succedent of the G-Action Rule, is the listed asking price of the artwork. Artists could attempt to make changes to how their artworks are presented to the public in order to increase their sales price and should be cautious about making changes that could potentially lower their sales price. The prices were discretized into ten groups using LISp-Miner (Rauch et al., 2019). The cuts were placed automatically to create partitions containing an approximately equal number of tuples. The price attribute was partitioned into ten levels, referred to in later sections as levels one through ten. Price is treated as a flexible attribute.

This work defines a set of stable attributes that are used in the antecedent portion of the G-Action Rule. Rules can be used to describe a work of art. The features chosen for the rules take inspiration from the sets used in Pawlowski et al. (2019). The chosen medium of the artwork is used as a stable feature. When the artwork was posted, the artist tagged it with a medium for search purposes. In order to keep the number of options within reasonable bounds, the mediums were discretized into seven categories and one “NA” category. The artistic style of the work as well as the artist’s subject is similarly used as a tag and as a stable attribute. As discussed in Pawlowski (Pawlowski et al., 2019), the size of a work is very relevant to determining its price, so the length and width of the works were discretized into five bins of roughly equal size using the LISp-Miner (Rauch et al., 2019) discretiztation tool and used as stable attributes. Additionally, the presence or absence of visible reviews was used as a stable binary feature. The review scores of the artists are consistently very high if they are present at all. This limits the utility of using the score as a quality metric. Lastly, the percentage of edges in the work was utilized as a stable attribute after being discretized. To calculate this, Canny edge detection, which was developed by John Canny in Canny (1986), was used from the OpenCV set of tools (Team, 2017) on the primary artwork image. A small percentage of artworks had errors when an attempt was made to retrieve the photograph, so those tuples were removed from consideration. This algorithm determines if an individual pixel represents an edge or not based on the amount of color variation surrounding it. The number of edge pixels and non-edge pixels are then counted, and the number of edge pixels was used as a stable feature. This can be considered as giving a rough idea of the amount of detail or amount of color variation in the work. The list of basic stable attributes used each time is given below. Others were added and will be discussed later.

  • Artistic Style (categorical)

  • Artistic Subject (categorical)

  • Medium (categorical)

  • Height (numeric, discretized into categorical)

  • Width (numeric, discretized into categorical)

  • Artist Has Visible Reviews (boolean)

  • Percentage of the Artwork Representing Edges (numeric, discretized into categorical)

The set of flexible attributes explored, part of the antecedent of the rules, centers around how a work will be perceived by a consumer. How long is the artist’s biography? Do they have a presence on social media? What is the tone of their writing? These are easily changeable for an artist hoping to improve their sales. These changes have a very low cost, so stakeholders may be more willing to try recommended rules. The full list of flexible features used is given below.

  • Word Count of the Artist Biography (Bio. WC) (numeric, discretized into categorical)

  • Word Count of the Artwork Description (Desc. WC) (numeric, discretized into categorical)

  • Social Media (Considered Together as SM)

    • Artist Listed a Facebook Profile (boolean)

    • Artist Listed a Twitter Profile (boolean)

    • Artist Listed an Instagram Profile (boolean)

  • Positive Sentiment Level of Biography (Bio. Ps.) (numeric, discretized into categorical)

  • Negative Sentiment Level of Biography (Bio. Ng.) (numeric, discretized into categorical)

  • Positive Sentiment Level of Artwork Description (Desc. Ps.) (numeric, discretized into categorical)

  • Negative Sentiment Level of Artwork Description (Desc. Ng.) (numeric, discretized into categorical)

The number of words in the biography of the artist and in the artwork description are used as predictive features. As was discussed in Powell et al. (2019), Powell et al. (2020a), and Powell et al. (2020b), the word count does have some utility as a price predicting feature. In other arenas of online sales, the length and wording of a description can have a bearing on the saleability of an item. In (Rawlins and Johnson, 2007), the authors analyzed sales on Ebay and determined that the length of the description could have an impact on the price.

Similarly, many collectors find artists through social media (The Hiscox Online Art Trade Report, 2018). It has become an increasingly important tool for helping artists be discovered by collectors. Can adding or removing a link to a profile change the opinion of a collector?

Determining the sentiment of the text was done using VADER, the ‘Valence Aware Dictionary for sEntiment Reasoning’, which is part of the Python Natural Language Toolkit (Bird et al., 2009; Hutto & Gilbert, 2015). The objective was to determine the polarity of the sentiment of the text. This strategy uses a set of words and phrases to determine the text’s sentiment, called an ‘opinion lexicon’ (Aggarwal, 2018). VADER uses a combination of an opinion lexicon and a set of rules (Hutto & Gilbert, 2015). The lexicon includes thousands of candidate terms rated by humans on a scale, and the rule set considers the impact of capitalization and negation (Hutto & Gilbert, 2015). Each piece of text was given a score for its level of positive sentiment, its level of negative sentiment, and its level of neutral sentiment. Each attribute was discretized, and they were used as flexible attributes. The sentiments found were largely neutral, with only a few pieces of text containing strongly emotional language.

Lastly, the use of color in the artwork was explored as an additional stable feature in the rule sets. The pixels in the artwork were clustered using Open-CV (Team, 2017) across the RGB dimensions using K-Means to determine the ten most frequently appearing colors. Out of this set of ten, the centroids of the largest cluster and second largest cluster were tested against eleven reference colors using the CIEDE2000 color difference measure (Sharma et al., 2004), which was implemented using (Taylor, 2014). The reference colors are based on the idea of universal basic colors which was first proposed in Berlin and Kay (1969). While this work has been challenged and the colors proposed have been refined such as in Lindsey and Brown (2006) and Roberson and Hanley (2007), the concept of basic colors is useful for classification. The eleven reference colors are white (R 255, G 255, B 255), gray (R 128, G 128, B 128), black (R 0, G 0, B 0), red (R 255, G 0, B 0), orange (R 255, G 128, B ), yellow (R 255, G 255, B 0), green (R 0, G 255, B 0), blue (R 0, G 0, B 255), purple (R 128, G 0, B 128), pink (R 255, G 192.0, B 203), and brown (R 63.8, G 47.9, B 31.9). The RGB values, other than brown, are taken from Color by name (2020) and the brown comes from Labrecque and Milne (2012).

4.1 Additional stable attributes

While utilizing the basic color has value as a classification feature, many of the distinctions between colors and their subtleties are lost. Therefore, other methods of using colors were used. The following section discusses two different methods of mapping colors to specific emotions.

First, the pleasure, arousal, and dominance scales, referred to here as PAD, were used as stable attributes. The PAD matrix is a three-dimensional space with independent axes for pleasure (ranging from pain to extreme pleasure), arousal (ranging from sleepiness to excitement), and dominance (ranging from powerless to control) (Mehrabian, 1978; Russell & Mehrabian, 1977). Relationships between the saturation and brightness of colors to the PAD scale were found in Valdez and Mehrabian (1994). Interestingly, this work determined that hue has a much less clear relationship with this scale (Valdez & Mehrabian, 1994). Valdez found equations reflecting these relationships using regression analysis.

$$ Pleasure = 0.69 Brightness + 0.22 Saturation $$
(1)
$$ Arousal = -0.31 Brightness + 0.60 Saturation $$
(2)
$$ Dominance = -0.76 Brightness + 0.32 Saturation $$
(3)

These equations were used on the colors extracted from each artwork using K-Means to create a weighted average score for each work. This was then discretized using Lisp-Miner and treated as a stable attribute.

Another method of quantifying color examined is the activity, weight, and heat scale, referred to here as AWH, originally proposed in Ou et al. (2004). This scale was designed to reduce the variations in the emotional impact of color based on culture or gender (Ou et al., 2004). These measures attempt to combine other descriptors of color (Ou et al., 2004). Color activity could be described as a measure of a color’s freshness, cleanliness, modernity, or passivity (Ou et al., 2004). Color weight is a description of its hardness, heaviness, and whether it could be described as masculine or feminine (Ou et al., 2004). Heat was included due its importance in other studies as determined by Ou et al. (2004).

The work of Ou et al. (2004) derived the following equations for calculating these features. The CIELab values of L*, C*, h, a* and b* are used.

$$ Color Activity = -2.1 + 0.06\Big[(L\text{*}-50)^{2} + (a\text{*} -3)^{2} + \Big(\frac{b\text{*}-17}{1.4}\Big)^{2}\Big]^{1/2} $$
(4)
$$ Color Weight = -1.8 + 0.04(100-L\text{*})+0.45\cos(h-100^{\circ}) $$
(5)
$$ Color Heat = -0.5 + 0.02(C\text{*})^{1}.07\cos(h-50^{\circ}) $$
(6)

As discussed above, the dominant colors of each artwork were extracted using K-Means and applied to the above equations to determine a weighted average for the work. These were then discretized.

4.2 Block averaging and clustering

The patterns of darkness and light in an image are among the most immediately obvious visual features. Further stable attributes considered were the average brightness of the image, determined using the average grayscale value of the work, the standard deviation of that brightness, and the average and standard deviation of the red, green, and blue color channels. In the following tables, the mean and standard deviation of the grayscale image is referenced as “gray”; the mean values of the red, green and blue channels are abbreviated as “RGB M”, and the standard deviations of the same channels are abbreviated as “RGB S”.

Another method of defining a work is by analyzing these patterns. The sample photo of each work was sliced into a three-by-three grid with an approximately equal number of pixels in each partition. This resembles the method used in Lombardi (2005) to quantify artwork features.

The position in the three-by-three grid of the brightest and darkest blocks were considered as stable attributes. This is abbreviated as “block” in the subsequent tables. In Fig. 1, an example of this method of representing an image is shown.

Fig. 1
figure 1

A cityscape and its representation as blocks

Next, the average brightness level of each block was calculated. Then, all works by the same artist were averaged block by block. This creates a single representative set of values that represents how their specific work tends to handle darkness and light. This was then used to generate clusters that would place artists with visual similarities together. More clusters were generated using RGB values in addition to darkness and light alone, but the resulting clusters contained almost the exact same sets of artists as the gray-based clusters, so they were not used further. This method is referred to in later tables as “Gray Blocks”. The process was then repeated with the average percentage of edges per block. This method is referenced as “Edge Blocks”. This creates values that represent the overall composition of an artist’s body of work.

Using the LISp-Miner Ac4ft-Miner tool, sets of rules were generated for combinations of prices, flexible attributes, a single set of stable attributes used throughout for consistency, and additional stable attributes. As the goal is to move prices from lower to higher price points, rules were generated to transition items from lower price levels to higher price levels. Rules generated using this method have support and confidence scores for both the before-state association rule and after-state association rule that are used together to construct the action rule.

5 Results

The first set of rules generated had no minimum confidence for a rule and had a requirement that the rule must apply to 50 or more tuples out of the dataset. Rules were generated for each flexible attribute alone, the social media features alone, and the social media features in combinations. This work compares the results of different flexible attributes alone to explore which have the greatest impact on the confidence and support of the resulting rules. Each set of rules contains an average of 1,700 action rules. At least one rule applies to approximately 97% of the input tuples at the associated price level. This is termed as the coverage of the rule set and can be used as a measure of its applicability. However, the confidence in these rules is quite low. As all the features discussed here have an extremely low cost, exploring low confidence rules is not a concern. To make the changes these rules suggest, an artist would only need to rewrite their artwork descriptions or change their profile. While some individual rules do have high confidence, the average confidence for the before rule is approximately 13%.

In response to the low confidence of the initial sets of rules tested, additional rule sets were generated with greater constraints on the generation process. Rules were only generated to move each artwork up to the next higher price level. The social media flexible features (abbreviated in tables as SM) were used to generate rules as a group. The word counts (abbreviated in tables as WC) of both the artist’s biography and description (abbreviated as Bio. and Desc., respectively) and text polarity features (abbreviated as Ps. for positive sentiment and Ng. for negative sentiment) were all used separately. To assess if overly strict support requirements were causing low confidence, the minimum support level for a rule was lowered from 50 to two for both the before and after rules, but the minimum confidence for a rule to be considered was changed to 60% for the before and after rules. Since the recommended changes would be very simple and low risk, a very low threshold for support was not a concern. The minimum number of attributes for the stable portion of the rule is one and the maximum number is five. The rule set generated here had a considerably lower average coverage at 9.375% but a considerably higher average confidence for the before rules at 79.16%, with the after rules at 79.2%. Considerably fewer rules were generated, and the average number of rules per group was 1,155. Table 2 displays the coverage of each rule set for the selected price levels and flexible attribute. This represents the percentage of objects at the lower price level that had at least one applicable rule to raise that object to the selected higher price level.

Table 2 Coverage of rules generated using base stable features

The level of support varies dramatically depending on the price level being addressed. The exact values of the prices referenced in the next table are as follows: (< 12.97 - 58.28), (58,28 - 95.90), (95.90 - 130.04), (130.04 - 188.13), (188.13 - 250), (250 - 350), (350 - 490), (490 - 742), (742 - 1351.24), (1351.24 - > 1,000,000). For example, the coverage of the rules sets that move artworks from the lowest prices, level one, to the next lowest price level, level two, ranges between 19.24% and 38.89% across all the different sets of flexible attributes. Notably, the social media attributes had markedly worse performance than the others. The attributes that changed an artwork’s description had less consistently high coverage than the attributes that addressed an artist’s biography. At higher values for price, coverage decreases. This may be due to the size of the shifts necessary to move prices from one tier to another at the higher levels.

To expand on the set of stable attributes, another set of rules was generated that added the dominant color of the work to the list of stable attributes. This set was generated using a randomly selected subset with approximately 100,000 tuples. LISp-Miner’s discretization tools were used to create a new set of partitions similar to those used previously for this set. The same restrictions on rule generation were repeated. Rules must have a minimum confidence of 60% for the before and after rules, and rules must have a minimum support of two. Slightly more rules were generated per set than in the previous variation with an average number of rules per group of 1,716. This set did have a slightly higher average coverage of 16.408% and a similar average before confidence of 78.65%, with an after confidence of 78.73%. As with the previous set of rules, the coverage shifts across different price levels. The coverage of each rule group changes depending on the selected flexible attributes and the selected price levels as demonstrated in Table 3.

Table 3 Coverage of rules generated using base stable features and main color

Rules were developed using new criteria to improve on the coverage of the rules generated above. New stable attributes were considered, support and confidence were still restricted, and rules could now have up to two flexible attributes and two stable attributes. Minor adjustments were made to the manual discretization, so that the attribute medium grew from eight categories, including an “NA” category, to ten with an additional “NA” category. The maximum number of tuples used for rule generation was limited to 50,000. With these modifications, rules were generated with the coverage levels shown in Table 4.

Table 4 Coverage with new attributes

No set of stable attributes have notably higher coverage than any other consistently. Many rules were generated, for example, over 3,000 rules were formed using only base features moving from price level one to price level two. Adding a second flexible attribute that could be changed led to a very dramatic increase in the number of rules formed.

A sample G-Action rule from this set is “Artistic_style (Surrealistic) & Width (very low) && Bio_pos (very high) & Desc_pos (lower) ->Price(1), Artistic_style (Surrealistic) & Width (very low) && Bio_pos (avg) & Desc_pos (higher) ->Price(2)”. In this example, the before rule is “Artistic_style (Surrelistic) & Width (very low) && Bio_pos (very high) & Desc_pos(higher) ->Price(1)”, and the after rule is “Artistic_style (Surrealistic) & Width (very low) && Bio_pos (avg) & Desc_pos (higher) ->Price(2)”. This may be read as, when Artistic_style is Surrealistic and Width is very low if Bio_pos changes from very high to avg and Desc_pos changes from lower to higher then Price will change from 1 to 2.

To raise the accuracy of the models and gain better coverage, the clusters based on artist style discussed in a previous section were used to form rules. While ten clusters were generated, there were significant size discrepancies. One cluster is a very distant outlier with only a single prolific artist. Another contains over 1,759 artists. To generate the following sets of rules, the maximum number of works considered was 50,000, but fewer works were used if the cluster was very small.

The addition of personalization noticeably improves the coverage of the rule sets. This is not consistent for all clusters, such as the sixth edge-based cluster, likely due to the small number of represented artists. Clusters with more artists, such as the fourth edge-based cluster, have much better coverage. However, the pattern of lower coverage in middle price ranges and lack of dramatic shifts in coverage due to stable attributes is consistent. In Tables 5 and 6, the average coverage percentage of the ten clusters is displayed.

Table 5 Aggregate coverage of gray block clusters
Table 6 Aggregate coverage of edge block clusters

Some interesting patterns emerge when examining this information. The coverage is consistently higher when examining the average coverage levels of the partitions over a random selection of tuples. More interesting is how the different partitioning methods shift in their level of coverage depending on the price ranges being considered. The gray block partition has higher coverage for the middle range values, especially when moving from level 3 to level 4, level 5 to level 6, and level 6 to level 7. By contrast, the edge-blocking method of partitioning handles the upper and lower ranges better. The rules were generated separately, so there may be slight differences in what each individual partition considers as set 3. The change in coverage shows potential for further exploration.

Rules were generated for all of the clusters with a few exceptions. As an example, the following rule was created for the first cluster of the brightness average, referred to as gray, “Subject(Flowers and plants) & Width(very high) && Bio_pos(avg) & Biowc(higher) ->Price(9)”, “Subject(Flowers and plants) & Width(very high) && Bio_pos(higher) & Biowc(very low) ->Price(10)”.

After noting the shifts in coverage depending on the partitioning method selected, the next step was to explore more methods of partitioning the dataset to examine what patterns would emerge.

Further rules were generated with slightly altered restrictions. The number of flexible attributes was restricted to one for this set of comparisons. The same set of stable and flexible attributes discussed previously was used, with the addition of the main three colors as stable attributes. The restrictions on confidence and support were repeated as in the previous example. All partitions have a maximum size of 50,000 tuples, but some, especially ones generated from edge or gray blocks, are smaller.

Several different methods of partitioning the dataset were explored. First, for the purposes of comparison, rules were generated with the mentioned method for four sets of 50,000 tuples. As shown previously, partitioning on visual features does have value. Partitions were generated using the gray and edge blocks as used above. Additionally, four partitions were constructed based on the average number of edges in an individual work. This was added primarily to compare to the results when using the number of edges broken into blocks across an artist’s posted body of work. To continue the use of visual features, five partitions were made using PAD (Pleasure, Arousal and Dominance) and five using AWH (Activity, Weight and Heat). This was made using k-means clustering on the PAD scores for an individual work. The process was completed using the AWH scores. Finally, six partitions were constructed based on the listed number of followers for an artist. This was meant to serve as a means of separating artists that are highly visible from those that are less widely viewed.

In Table 7, the average coverage of the partitions is shown. The number of followers partition is abbreviated as Follow, the gray blocks partition is abbreviated as G. Bl, and the edge blocks partition is abbreviated as E. Bl. Rules were generated for increases in price from all levels, rather than just to the next highest step.

Table 7 Partition coverage averages

When examining the aggregate scores, a number of patterns emerge. First, it is evident that using a form of partitioning produces superior coverage rather than using randomly selected tuples.

Secondly, while there is some variation regarding which partition generates the best coverage, the gray block method and the edge block method partitions produce the highest average coverage for most price pairs. Interestingly, the PAD partition and follower partitions also perform well but not as well as the gray block and edge block partitions. The edge partition performs considerably less well than the edge block partitions. Also, the pattern of whether the edge or gray block partition performs better on a given price pair does not always match the results from the previous set of rules generated.

6 Conclusions and future work

This work only begins to address the potential for the use of action rules to improve artist sales in the market for contemporary fine art. Many potential avenues for further research exist in the development of features for action rules.

In the initial set of results, the very high rate of coverage is promising. However, the low confidence level, while high for specific rules, is on average quite low. This issue may be attributable to allowing too many low confidence rules. It may also be due to the partitions of the feature sets being overly broad. However, low confidence for the set of G-Action rules is not a barrier to their being useful to an artist. These rules are extremely low cost and easily implemented. Stakeholders may want to try them even on the small chance that the rules will provide an improvement. The second and third set of rules discussed demonstrates that it is possible to significantly raise the average confidence in the rules, but it is at the expense of lowering the average coverage of the rule sets.

The addition of personalization by clustering visually similar artists increased the coverage of many of the derived rule sets. By grouping similar artists, the coverage was consistently increased. However, the question of what method of partitioning is most effective remains open for further exploration. Often the method of developing features to represent an artist’s overall visual style had the highest level of coverage. The edge and gray block methods group artists, rather than artworks. Interestingly, when comparing the results of partitioning based on the percentage of edges in a work as opposed to the block method, it demonstrates that partitioning based on the overall traits of the artist produces superior rule sets to partitioning based solely on the features of an individual work. This pattern continues to be visible in the high coverage of the partitions based on the gray blocks method and the number of people following a given artist. One exception to this pattern is that partitions formed using the PAD cluster method often had a high coverage.

Certain partitions produced higher coverage on particular sets of prices. However, which partitions perform better in a specific situation remains an avenue for later research. Similarly, the exploration of whether the performance of individual partitions changes depending on the prices or the selection of attributes is an interesting topic for further exploration.

One strong avenue for potential future research is the development of more features for rule generation. The development of new stable visual attributes did not have a dramatic impact, so flexible attributes, especially flexible attributes with a higher cost, should be considered. With higher accuracy, stakeholders may become willing to make more drastic changes. The development of more flexible attributes, as well as exploring the attributes discussed here in combination, has potential value for stakeholders in the art market. In addition to expanding the list of artwork features, artists have characteristics that may serve as additional stable attributes. A greater exploration of larger scale career changes an artist could make has great potential for research.