Keywords

1 Introduction

The Market Basket Analysis (MBA) is the examination and discovery of combinations of products and services that customers buy most frequently [1]. The idea is to find associations between different market offerings from the hundreds or thousands of transactions that buyers make in a given time. For example, typically in a grocery store, the manager would want to know if a particular product tends to be sold with something else. This type of knowledge is useful for establishing promotion and product bundle strategies.

The discovery of these kinds of associations has been limited to the use of associations rules which are generated from the Apriori Algorithm [8] and other more modern variants (see for example: [2,3,4,5,6]) that overcome the problem of the large number of rules that the algorithm generates, and that reduce the computational cost of rules mining by: avoiding scanning the database repeatedly, adding additional restrictions to the rule pattern structure, and through parallelization methods [7].

The popularity of the association rules may be due to the ease of their interpretation at a micro level, but they do not deliver as a result a comprehensive or global model of all inter-relationships of all the available market offer. In this sense, a network approach seems to be useful if the analyst wants to have a more systemic view of the purchasing behavior based on the relationships between the elements that make up this interconnected network. For example, using a network graph mining is an alternative to display large volumes of information by applying filters with different levels of thresholds to intuitively understand the relationships between products [10]. Using a network approach, it has also been possible to establish a methodology that simplifies the search for product sets that have a high level of co-occurrence among consumers, using minimum spanning trees to reveal the interconnectivity of products with the highest level of co-occurrence [11].

Motivated by this last line of research, we also use a network approach, but we take a step further, not only in analyzing the properties of a network of products but also in using a Boltzmann machine that allows us to learn from the relationships between different market offerings. We assume that each element of the market offer is part of a complex system with intricate interrelationships, which are determined by the revealed preferences of the buyers. Thus, transaction data from many customers is taken as input to initiate a learning process resulting in a network of couplings. The network of couplings contains all the information that determines how a certain product relates to any other product that is part of the market offer. This network is equivalent to a complete network where the magnitude of the weights of each edge quantifies the level and sign of the node-node interaction.

The strongest links carry a particular significance in the network of couplings. For example, a large magnitude positive coupling between two products in our model implies a high probability of activation of those two products simultaneously. Conversely, a large negative coupling implies a high probability that while one product is active, the other is not. In this way, the strongest links tell us a lot about the purchasing behavior of consumers. It should be noted that this is not the same as computing the correlation or the probability that pair of products will be carried at the same time in a market basket. Our approach considers the discovery of the joint probability distribution of the states of a system, i.e., the probability that a particular market basket will occur. In this process, the model is able to estimate these coupling parameters that are part of the network of couplings.

To reveal the role of the strongest links in the system, we use a minimum spanning tree (MST) over the network of couplings. By avoiding weak or negative links, the MST determines a particular cluster of edges that represent the backbone of the coupling network. In this way, we can propose a simple iterative approach of product clustering that ensures us sets of high-frequency purchase products, than could serve as a recommendation system for promotion activities.

2 Ising Model and Minimum Spanning Tree

One problem with the MBA is that the possible number of combinations of items conforming a market basket is very large. For example, a convenience grocery store would carry 2000 stock keeping units (SKUs). If on average a customer buys 10 SKUs, the total number of possible baskets with 10 products would be approximately \(2{.}76^{26}\). From this set of baskets, finding which are the most attractive for buyers is already a difficult task, even more so if from these baskets, we try to discover the strongest association relationships between products.

To deal with this problem, we model the MBA as a collective purchasing system resembling the Ising magnetic system [18]. Each part of this system is composed of microscopic variables or spins \(s_{i}\) that have a certain probability of interacting with other spins. Spins can take values of −1 or 1. To be more precise we describe a market basket as a vector \(\mathbf {s}=(s_{1},...,s_{N})\) where N is the number of spins or elements of the market offer (e.g., SKUs, or aggregated levels of categories or types of products).

The energy \(H=H(\mathbf {s})\) describes the system in a configuration \(\mathbf {s}\). In other words, the macroscopic state of the system is defined by the values of the local spin magnetizations. This energy of the system as a function of the spins is:

$$\begin{aligned} H = -\sum _{\lbrace i,k \rbrace } J_{ik}s_{i}s_{k} - \sum h_{i}s_{i} \end{aligned}$$
(1)

For MBA purposes, energy H is a proxy of the utility function \(U(\mathbf {s})\) [9], in which \(U(\mathbf {s})=-H(\mathbf {s})\) with pairwise and idiosyncratic parts. The expression \(\lbrace i,k \rbrace \) indicates that the summation is made on the nearby neighbors. \(J_{ik}\) are the values of the interactions between spins and \(h_{i}\) is the external magnetic field. When \(J_{ik} > 0\), there is a ferromagnetic interaction, which causes the spin i and k stores to be oriented in the same direction. When \(J_{ik} < 0\), then one gets antiferromagnetic orientation, that is, two spins i and k tend to be in opposite orientations.

Unlike the spin glass studied in statistical physics where the connections are based on regular lattices, the collective purchasing system is equivalent to a complete network, or a network of couplings, that is, each spin is connected to the rest of the \(N-1\) spins through the parameters \(J_{ik}\), resembling the infinite-ranged version of the Edwards-Anderson model [19] of the spin glass proposed by Sherrigton and Kirkpatrick [20].

We would like to know the probability \(P(\mathbf {s})\) that a particular configuration \(\mathbf {s}\) is found in the system. This probability can be written in the form of the Ising distribution:

$$\begin{aligned} P(\mathbf {s}) = \frac{1}{Z} e^{-H(\mathbf {s})} \end{aligned}$$
(2)

where Z is the partition function,

$$\begin{aligned} Z = \sum _{\{ s_{i}\} } e^{-H(\mathbf {s})} \end{aligned}$$
(3)

and the summation is running over the \(2^{N}\) points of phase space.

The problem of finding \(P(\mathbf {s})\) is hard in general and it corresponds to the inverse problem of inferring the model parameters \(J_{ik}\). For N spins, the number of parameters to infer will be \(N(N-1)/2\).

To find the parameters, we use Boltzmann learning algorithm [12], following the same procedure for transactional datasets [13]. The algorithm is initialized with an initial guess of the parameters, and then in the negative learning phase, the averages and pair correlations are calculated. The parameters are then recalculated according to the contrastive divergence formula [16]. The averages and pair correlations are calculated according to a Metropolis-hasting simulation process under the Ising distribution.

Once the coupling parameters have been found, we have the matrix of couplings, which can be represented as a complete network of interactions among spins, that we call coupling network. This information is useful to understand the behavior of the system, similar to what would be a magnetic lattice system, with the difference that the network is not regular. Similar to a spin glass, we have a kind of a disordered system in which the elements can interact in a ferromagnetic or antiferromagnetic way with a certain probability.

To describe a disordered physical system, it would be necessary to study the physical quantities or critical exponents that depend on some general features of the system. This is beyond the scope of this paper, however, it has been found that minimum spanning trees (MST) is a universal property of the system since their geometry is not altered by the level of existing disorder [14].

The MST of a graph G with vertex V, edges E and with a cost \(l_{e}\), is assigned to each edge \(e \in E\). In our case, we find the MST over the network of couplings, which is equivalent to a subset of G edges that connects all the vertices with no cycles. Such a tree exists because the network is connected. So the MST T is a spanning tree such that the sum of all the costs of its edges, l(T):

$$\begin{aligned} l(T) = \sum _{e \in T} l_{e} \end{aligned}$$
(4)

is minimized over the set of all spanning trees on G. In disordered systems, the path of an MST is equivalent to the path in which the energy barriers are the smallest [14, 15]. In this way, we can take advantage of this interpretation to say that the edges of the MST over the coupling network have an energy \(e_{ik}\). Thus, the total energy is the sum of all the bonds that shape the MST:

$$\begin{aligned} E^{mst} = \sum _{ik \in T} e_{ik} \end{aligned}$$
(5)

This value is unique to each system. Since the paths in the MST are those in which energy barriers are minimal, we can say that the MST offers us a way in which different elements of the system can be accessed at minimal cost. For example, given the MST T, if we want the energy barrier to access a set of vertex \(v \in V\), say, \(E_{v}^{mst}\), it would simply be:

$$\begin{aligned} E_{v}^{mst} = \sum _{i,k \in v} e_{ik} \end{aligned}$$
(6)

3 Model

The probability of a given state \(\mathbf {s}\), \(P(\mathbf {s})\) depends on energy H. Since we are interested in the interaction between the spins of the system, we define the energy of couplings as the energetic contribution of energy due to that interaction as between the spins that are at v:

$$\begin{aligned} E_{v}^{c}(\mathbf {s}) = -\sum _{\lbrace i,k \rbrace } J_{ik}s_{i}s_{k} \end{aligned}$$
(7)

Thus, low level coupling energies \(E_{v}^{c}\) indicate stable system configurations (or ground states). This gives us information on those system states that are most likely, or most useful to the consumer. If we combine this information \(E_{v}^{c}(\mathbf {s})\) with the minimum energy barriers \(E_{v}^{mst}\), it would be possible to discern what changes can be made to a particular system configuration (e.g. a minimum energy one) that will lead to a more likely state. A transition from one configuration to another occurs in such a way that the energy involved is minimal.

Associated to an initial configuration \(\mathbf {s}^i\) with a set v of active nodes and with cardinality \(n(v)=K\), we have the energy of coupling \(E_{\mathbf {s}^i}^c\) and the energy barrier \(E_{\mathbf {s}^i}^{mst}\). We want to find a configuration \(\mathbf {s}\) with a set \(v'\) of active spins in which \(v \subset v'\), with cardinality \(n(v')=K+1\), with coupling energy and energy barrier \(E_{\mathbf {s}}^c\) and \(E_{\mathbf {s}}^{mst}\) respectively, such that \(E_{\mathbf {s}}^c + E_{\mathbf {s}}^{mst}\) is minimum.

This process of minimization is equivalent to the search for configurations that increase the size of market baskets from an initial basket to a final one, sacrificing utility to the minimum, but whose transition from one to the other is the least costly in terms of energy. The possible universe of solutions of \(\mathbf {s}\) to minimize E is huge, however, given that the MST is the path of minimum energetic barriers, the spectrum of possible solutions is reduced considerably to only testing those solutions \(\mathbf {s}\) in which \(v' \in T\).

The MST offers the possibility of aggregating products to the cluster at the minimum energy cost, however, this does not ensures that the cluster is at the same time a basket of minimum coupling energy. It can happen that the cluster with \(K-1\) products is one of minimum energy, but the aggregation of a new product will make that the cluster with K products is not minimal basket in terms of energy, which could lead to a violation of transitivity. For example, the combination of products A and B represent a cluster of minimum energy, as well as the combination of product B and C, but the cluster with the combination of the three products will not necessarily be of minimum energy. To deal with this problem, a simple solution is to minimize \(\alpha E_{\mathbf {s}}^c + (1 - \alpha ) E_{\mathbf {s}}^{mst}\). So, when \(\alpha =1\), we give priority to finding clusters of minimum energy (but not at minimum costs), and if \(\alpha =0\) we find clusters of lower energy cost (but not at minimum coupling energy).

4 Method

4.1 Data

To show an application of the proposed model, we take a real database of a national supermarket chain. The data contains 42077 transactions of a branch located in Santiago, Chile. The records indicate the customer’s id, and the shopping basket, that is, the set of products that the customer bought during their visit to the supermarket. To simplify the analysis, we took the twenty-five subcategories of products with the highest level of turnover. In this way, a market basket of each purchase is represented by a vector \(\mathbf {s}\) of length 25, with values 1 or \(-1\) indicating whether the subcategory is present or not respectively its the basket.

4.2 Learning

We inferred the couplings parameters of the Ising model using 400 steps of the Boltzmann learning with a decreasing learning rate of \(\nu =0.8\) and a decay of 0.0001. In each step, the means and pair correlations were obtained from a Metropolis-Hasting process using 40000 steps. With this number of steps of the Boltzmann machine we have empirically ensured a convergence of the parameters with an RMSE of less than \(10^{-3}\).

5 Results

5.1 Distribution of the Couplings

It is worth mentioning that the mean of the inferred couplings is negative equal to 0.006. Figure 1 shows the histogram of the couplings. A proportion close to 39% of the couplings are negative and the remaining 61% are positive ones. This shows that the system is disordered, with a tendency to have most pairwise relationships anti-ferromagnetic, i.e., pairs of products tend not to be present simultaneously in the same shopping basket.

Fig. 1.
figure 1

Histogram of the couplings inferred by Boltzmann’s machine. The mean of the couplings is \(\langle J_{ik} \rangle =0.006\)

5.2 Consistency

First, it is necessary to test that the Equation model 2 correctly reproduces empirical data. For this, we compare the means and pairwise correlations of the real data with the data sampled with Metropolis-Hasting using the inferred magnetizations and couplings.

Fig. 2.
figure 2

Scatterplots of the following observables: A- one-body averages \(\langle s_{i}s_{j} \rangle \), B- pairwise connections \(\langle s_{i}s_{j} \rangle \) and, C- two-body connections \(C_{ij}=\langle s_{i}s_{j} \rangle - \langle s_{i} \rangle \langle s_{j} \rangle \). Empirical and estimated values on x and y-axes respectively. \(\lbrace \bullet ^* \rbrace \) are the estimated observables computed from the Metropolis-Hasting sampling.

As shown in Fig. 2, the recovery of the spins orientation \(\langle s_{i} \rangle \) (which correspond to the frequency of purchase of the products) has been satisfactory, also, the pairwise connections \(\langle s_{i}s_{j} \rangle \) and the two-body averages (equivalent to the correlation between products) \(C_{ij}\). The root mean square error (RMSE) for two-body connections, pairwise connections and one-body averages were: 0.000939 (\(\rho =0.999\)), 0.0.00062 (\(\rho =0.998\)) and 0.00058 (\(\rho =0.996\)) respectively.

5.3 Network of Couplings and MST

Figure 3A shows the array of couplings \(J_{ik}\) found by the Boltzmann learning using the real dataset. It is clear that most interactions between subcategories are negative while a few are positive, which means a frustrated interaction system [17], similar to what happens in physical spin glass [18]. Although this information is interesting because it describes the behavior of the collective purchasing system, it is not useful to find combinations of products that are susceptible to promotion strategies or incentives.

The next step is to find the MST over the network of couplings. However, to find the MST, it is necessary to take into consideration that a positive coupling implies a complementary relationship between products (if product i is in the basket, then the product k will also tend to be present), while a negative sign coupling implies a substitution relationship between products (if the product i is in the basket, then the product k will tend not to be present). To accomplish this, it is necessary to carry out a proper transformation of the couplings in such a way that the original order of the couplings is preserved. This transformation converts the values of the couplings into positive magnitudes, representing a proxy of distance between vertices, in such a way that with these values they serve as input to the Prim’s algorithm to find the MST.

Figure 3B shows the MST found. It is easy to see that some of the strongest links between nodes are those that join subcategory 80 (vitaminized noodles) with subcategory 11 (rice grade 2), 78 (flavored noodles) and 7 (vegetable oil), etc. In the next section, we see how to look for interesting market baskets according to the information of minimum distance links and coupling energies.

Fig. 3.
figure 3

Graph networks of couplings found. Blue and orange colors of the edges correspond to negative and positive interaction strengths respectively. In A the network of couplings. In B the minimum spanning tree. In this figure we draw the weight (greater thickness, greater magnitude) and signs of the original edges (blue is negative and orange is positive). (Color figure online)

5.4 Looking for Interesting Market Baskets

To exemplify the usefulness of the methodology, let us begin by choosing an initial state \(\mathbf {s}^{i}\). We could use any state as a starting point, however, to describe the growth process of the cluster, we start with only one active product (\(n(v)=1\)). To select this product we select the one with the highest purchase frequency at the supermarket that corresponds to vitaminized noodles, so, \(\mathbf {s}^i=\{80\}\). From this vertex, we look for another one. The next vector \(v'\) of active vertices provides the minimum increase of energy and at the same time go through the lowest level of energy barrier that must be overcome to cross the MST of v to \(v'\). The result will be to increase the cardinality of the vector achieving \(\mathbf {s}=\{80, 78\}\). That is to say, the easiest way to begin to cross the MST from 80, is to go to 78 (flavored noodles). At the same time, this state is the lowest possible energy configuration so it ensures a high purchase frequency.

Fig. 4.
figure 4

The path of energies on the plane \(E_{v}^{c}\);\(E_{v}^{mst}\) starting from vertex 80 and using \(\alpha =0{.}5\).

We can repeat this same process several times. The order in which the products appear is important because it defines the way in which they are related. In Fig. 4 we can see the path by several vertices of the MST in the plane \(E_{v}^{c}\)-\(E_{v}^{mst}\). This path describes the best aggregation of products from one initial state to the next, in terms of coupling energy and energy barriers on the MST. For example, if the retail manager wanted to design a special bundle promotion package with product 80, the best way to do it is through product 78. And if he wanted to extend the size of the promotion, the next product to include would be product 11, and so on.

It is interesting to note first, that as the number of products added to the cluster grows, the coupling energy decreases, which means that the probability of activation of the indicated spins increases. However, energy barriers are increasing due to the greater distance needed to cover more products. However, the cluster represents the most likely market basket with K products beginning the process from product 80.

Second, the energy gradient from vertex 80 to 214 (beaten yogurt) is constant. However, from this last vertex to 100 (natural milk 1L) the coupling energy does not vary too much, and only increases the energy barrier. This means that the probability of those states does not change too much, but the cost of adding more products does. Observing the MST, this behavior is related to the structure of the graph in which the vertex 108 seems to operate as a link between two large sub-trees of the MST.

Fig. 5.
figure 5

Paths of energies starting from different vertexs using \(\alpha =0{.}5\). (Color figure online)

Figure 5 shows three paths for different starting vertices. The black line is the original starting from node 80. If the starting point is now 72 (machine detergent powder) which is a leaf on the MST, it is possible to observe that the path at the beginning is totally different from the original (dark gold color line), however at iteration number 9, both paths become identical. This means that from this point on, promotion strategies should be the same. When starting from vertex 214 (beaten yogurt) which is located on other sub-tree of the MST, we observe (dark orange line color) a totally different behavior, that seems to be more difficult to implement compared with the other because of the fast increase of barrier energies. This means that these products are weakly related to each other.

6 Conclusions

The spin glasses view of the purchase system considered in this study, is an approach completely different from the traditional task of finding association rules. In this proposal, we are able to find information about the conjoint probability distribution of all possible states of the system, ie., the market baskets.

The distribution of the inferred couplings indicates us that the collective system of purchases has similarities with spin glasses, a kind on material studied in condensed matter physics, in which particles can be related in a ferromagnetic or antiferromagnetic way. In the first case, the alignment of the moments of the particles occurs in the same direction, while in the second, the moments are aligned in opposite directions. Since many particles co-exist with both types of interactions, we are in the presence of a disordered system. This characteristic has important practical implications. In economic terms, it implies that certain products are complementary capacity, while others have substitute capacity. In the first case, the presence of one product in the market basket encourages the presence of another product, while in the second case, the presence of one product discourages the presence of another.

To find those complementary relationships between products in this disordered system, we use the universal property of MSTs to describe the complex inter-relationships. The MSTs are unique to each system and show the path that links each particle of the system with minimum energy, or in other words, show the path through each product at minimum cost.

In this paper, we have shown that the information of the coupling energy of a particular system state, and the link energy of the MST, offers us a guide to finding interesting market baskets, which describe a high probability of occurrence but also have low intervention cost. This approach is quite different from the traditional approach, which typically resorts to the well-known association rules as a classic tool of market basket analysis and whose disadvantage lies mainly in the high number of rules that the analyst must investigate, and the difficulty in observing the inter-relationships between different products in the absence of a systemic view. On the other hand, our approach allows us a more unified understanding of the inter-relationships between different products and a successive approximation of how attractive market baskets can be generated.