Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Motivation and Introduction

How accurately the prediction market can predict, up to the present, is basically an empirical issue. However, empirical studies per se cannot articulate why sometimes the market for some events performed extremely well and sometimes it did not [2]. While there are a number of studies trying to identify the factors contributing to its successes or failures, the explanations supporting the found causal links remain very verbal and informal, and a rigorous mechanism has not been explicitly spelled out. This is partially due to the limited analytical tractability of the prediction markets which operate in practice. In this article, we argue that, the spatial configuration, i.e., the distribution of information over agents, situated in different places, can matter for the prediction accuracy of the prediction markets. However, since the usual analytical model cannot effectively deal with these geographical variables, an agent-based spatial model of prediction markets is proposed to address the geographical significance. To begin with this line of research, our model is tailored to the future events related to political elections only, normally known as the political futures. In other words, we shall show how geographical factors can be part of the functioning of the prediction accuracy of the political futures markets.

The rest of the paper is organized as follows. Section 2 introduces our proposed spatial agent-based prediction markets and the two essential ingredients in the model, namely, toleration capacity and exploration capacity. Section 3 discusses the design of our simulation and shows the simulation results. Section 4 gives the concluding remarks.

2 The Model

2.1 The Market

Network-Based Formation of Expectations and Reservation Prices. Our first step is to make the social network explicit (Sect. 2.2). Through the given social network, agents disseminate and acquire the information and form their expectations of the future election outcomes, upon which their decisions on bids and asks are based. We assume that, to form an expectation regarding the election outcome, all agents use the sample average as the estimate, and the sample available for each agent is identical to the set of all his connecting agents (to be defined later). In other words, by using the sample proportion of the connecting agents supporting each political candidate, the agent forms his expectations about the share of the vote of each candidate. This estimated share becomes the reservation price held by the agents. To make this point precise, let \(\hat{p}_{i,j}\) be the subjective estimation of agent \(i\) regarding the share of the votes attributed to candidate \(j\), and \(b_{i,j}\) be the reservation price that agent \(i\) holds for the futures related to the vote share of candidate \(j\). Then

$$\begin{aligned} b_{i,j}=\hat{p}_{i,j}= \frac{\#\{k: k \in N_{i} \cap V_{j}\}}{\#N_{i}}, \ \ i=1,2,...,N, \ \ j=1, ...,m, \end{aligned}$$
(1)

where \(N_{i}\) is the set of agent \(i\)’s connecting agents (to be defined later), and \(V_{j}\) is the set of voters who support candidate \(j\). By (1), if the estimated share of the votes of Candidate A is 60 %, then the reservation price of the future contract for the share of votes of Candidate A is 60 cents. With this reservation price, the agent would not accept any bids which are lower than 60 or any asks which are higher than 60.

Bidding and Asking Strategy. In fact, following most agent-based prediction markets [5, 9], we assume that all agents are zero-intelligent agents (the entropy-maximizing agent) in the sense that the agent will bid or ask randomly with the constraint of making no expected loss [1, 4]. Therefore, his bid \(p_{b,i,j}\) will be uniformly sampled from the interval between the floor, which is zero cents, and the reservation price \(b_{i,j}\), and his ask \(p_{a,i,j}\) will be uniformly sampled from the interval between his reservation price and the ceiling, which is one dollar, as shown in Eq. (2).

$$\begin{aligned} p_{b,i,j} \sim U[0,b_{i,j}], \ \ p_{a,i,j} \sim U[b_{i,j},1], \ \ i=1,2,...,N, \ \ j=1,2,...,m. \end{aligned}$$
(2)

Trading Mechanism. The trading mechanism adopted to run the market is continuous double-auction, the one frequently used in experimental economics to test the Hayek hypothesis [7]. As shown in Fig. 1, our agent-based prediction market starts from a random draw of the agents. Each agent shall be drawn exactly once; in other words, the draw proceeds in a sampling-without-replacement manner. When agent \(i\) is drawn, he will be randomly placed into one of the \(m\) markets and will be equally likely to be assigned either a buyer position or a seller position. He will then submit a bid if he is a buyer and submit an ask if he is a seller. His bid or ask will be placed in the order book. A match happens if either his bid (\(p_{b,i,j}\)) is greater than the remaining lowest ask (best\(p_{a}\)) in the order book or his ask (\(p_{a,i,j}\)) is lower than the remaining highest bid (best\(p_{b}\)). The transaction price will then be determined as best\(p_{a}\) if the former applies or as best\(p_{b}\) if the latter applies.

Fig. 1.
figure 1

The flowchart of the order-book driven prediction market

Fig. 2.
figure 2

Geographical distribution of voters and their political identity. Both panels are the converged configurations using \(v_{1}=45.63\,\%\) (green), \(v_{2}= 51.60\,\%\) (blue), \(v_{3}=2.77\,\%\) (orange), \(N=13,454\), and \(G\) (number of grids) = \(193 \times 193\). The black grids denote the unoccupied cells, and the colored grids denote the occupied cells. The number of occupied cells and the number of unoccupied cells are determined in such a way that the resultant population density is close to 36 % (see Table 1). The two panels differ in terms of the toleration capacity: on the left, \(s=0.75\), and, on the right, \(s=0.25\). (Color figure online)

2.2 Geographical Distribution of Agents

The social networks considered in this paper are generated from the Schelling segregation model [6], in which the location of agents is determined by their toleration capacity for agents with different political identities. In other words, we replace the ethnic heterogeneity of agents in the original Schelling model with their political identity (\(j=1,2,...,m\)). Agents tend to reside in the place which is surrounded by neighbors with the same political identity. Their toleration of neighbors with different political identities is characterized by the parameter, toleration capacity (\(s\)). If the ratio of neighbors with different political identities is larger than this threshold \(s\), they tend to move to a close place which their toleration capacity can handle. This migration process will be iterated until it converges to a fixed configuration. We then use the resultant configuration to represent the geographical distribution of residents with different political identities.

Apart from the toleration capacity, an additional parameter of Schelling’s segregation model is the demographical structure characterized by the percentage of agents of various political identities. Denote them by \(v_{j}\) (\(j=1,2,...,m\)).

$$\begin{aligned} v_{j} = \frac{\#(V_{j})}{N}, j=1,2,...,m, \end{aligned}$$
(3)

where \(N\) is the total number of agents.

Figure 2 demonstrates a geographical distribution of political identities. In this specific example, there are a total of 13,454 agents, distributed on a checkerboard with \(193\times 193\) grids, i.e., with a population density of 36.12 %, and \(m=3\) (three candidates or three political parties): \(v_{1}=45.63\,\%\), \(v_{2}= 51.60\,\%\), and \(v_{3}=2.77\,\%\). Agents with the three political identities are denoted by the green (\(j=1\)), blue (\(j=2\)), and the orange color (\(j=3\)), respectively.Footnote 1 What is demonstrated in Fig. 2 are, therefore, two of the converged configurations of agents who followed the Schelling rule of migration. The one on the left is the one corresponding to a toleration capacity of 0.75, and the one on the right is the one corresponding to a toleration capacity of 0.25.

Fig. 3.
figure 3

The von Neumann Neighborhood with a radius of 2 (left) and 5 (right). The above figures show the von Neumann neighborhood of agent \(i\), as pointed to by an arrow. The left panel is a neighborhood with a radius of 2, whereas the right panel is a neighborhood with a radius of 5. (Color figure online)

2.3 Exploration Capacity

For each agent, his information supplier, i.e., his set of connecting agents, is determined by a von Neumann neighborhood with a given radius (\(r\)). This is shown in Fig. 3. As shown in Eq. (1), agents are assumed to know the political identities of all of their connecting agents in the neighborhood (agents in the gray area), and they use this sample (local information) to estimate the share of the votes for each candidate. The radius, \(r\), can be interpreted as the information exploration capacity of the agent. The larger the radius the larger the sample, and hence the less biased and the better the estimation. In this article, we assume that agents are homogeneous with respect to this capacity but would like to examine how this parameter may affect the emergent market performance.

Fig. 4.
figure 4

Display of the NetLogo program (Color figure online).

2.4 Programming with NetLogo

The above-mentioned spatial agent-based prediction market is programmed with NetLogo 5.0.3 and is available from the OpenABM websiteFootnote 2. Figure 4 shows a familiar NetLogo display of running this program.

In Fig. 4, the upper left panel (panel A) gives the user-supplied control parameters: \(N= 13,454\), \(v_{1}=40.55\,\%\) (green), \(v_{2}=51.60\,\%\) (blue), \(v_{3}=7.85\,\%\) (orange), \(s=0.50\) (50 %) and \(r=5\). The diagram shown in the right middle panel (panel B) is the converged configuration using the Schelling rule with \(s=0.5\). With a radius of 5, we can have the price expectations (reservation prices) of all three futures for all agents, i.e., \(b_{i,j}\) (\(i=1,...,13454\), and \(j=1,2,3\)). What is shown in the right upper panel (panel C) of the figure are the three histograms of the reservation prices corresponding to the green, blue and orange party, respectively. The basic statistics, including the mean, the median and the standard deviation, are shown in the very bottom of the figure (panel D). There we can see that the mean and median for the green candidate are 0.4163 and 0.4155, which is a one-point upward bias away from the true value of 0.4055. In addition, for the blue candidate, these two statistics are 0.5008 and 0.5025, which is a one-point downward bias away from the true value of 0.5160. Maybe the worst case is the market for the orange candidate. The two corresponding statistics are 0.1335 and 0.1315, almost two times larger than the true value of 0.0785. Our research question is then, to what extent, this specific network topology may affect the accuracy of the prediction market or the political futures market in our case.

From the histogram, we can further derive the aggregate willingness to buy (when the price is below the reservation price)

$$\begin{aligned} Q^{D}_{j}(p) = \#\{i: b_{i,j} > p \}, \end{aligned}$$
(4)

and the aggregate willingness to sell (when the price is above the reservation price)

$$\begin{aligned} Q^{S}_{j}(p) = \#\{i: b_{i,j} < p \} \end{aligned}$$
(5)

i.e., the demand curve (\(Q^{D}_{j}\)) and the supply curve (\(Q^{S}_{j}\)).

The demand and supply curves of the three markets are shown in the lower middle and right panels (panel D). Then through the random draws of the agents and their reservation price, the order book for each market is formed, and the corresponding transaction price is generated as the time series shown in the lower left panel of the figure.

3 Simulation

3.1 Simulation Design

The main focus of this paper is to understand how the information aggregation can be affected by how it is distributed through the two control parameters, namely, toleration capacity (\(s\)) and exploration capacity (\(r\)). In fact, we believe that these two parameters, to some extent, characterize the quality of voters, their cultural backgrounds, sociability, and openness. None of these attributes has been mentioned in the original article of the Hayek hypothesis [3]. Presumably, they are all irrelevant or insignificant. This paper is purported to revisit this hypothesis from a cultural and social-psychological aspect.

Given this focus, most parameters should be held constant throughout the simulation, and include \(N\), \(m\), \(d\), and \(G\) (Table 1). Nonetheless, to make the choice of these parameters not entirely arbitrary and to clothe them with some empirical flavor, we use the real data from Taiwan to suggest some reasonable values of these parameters. According to the 2010 demographic census data in Taiwan, the number of qualified voters in the 2012 presidential election was 13,453,305. By scaling down the number of people by 1,000 times, there are 13,454 agents. Hence, \(N\) is set to 13,454. In addition, by considering the population density of Taiwan, \(d\) is set to 36.12 %, which implies that we need to have a grid size of \(193\times 193\).Footnote 3 Hence, \(G\) is also determined. As to the number of candidates, in the most recent Presidential election in Taiwan, held in the year 2012, there were three major political parties and hence three major candidates. Hence, \(m\) is set to 3. This finishes the description of constant parameters in Table 1.

The rest of the prediction market is characterized by four major parameters, \(s\), \(r\), \(v_{1}\), and \(v_{3}\). We first give a range for each of these parameters; each design can be regarded as a three-tuple randomly selected from this range. For \(s\), we consider a range from a low toleration capacity (0.26) to a high toleration capacity (0.75), with an increment of 0.01. The exploration capacity (\(r\)), it starts with a minimum of 2, and ends with a maximum of 6. Finally, for \(v_{i}\), considering the practice of Taiwan politics, we fix the share of the votes for the small party, i.e., 3 %, and then allow the other two major parties to vary in opposite directions. Again, from an empirical consideration, the range of \(v_{1}\) is set from 18 to 47, and then \(v_{2}\) takes the rest. We then randomly generate 1,000 designs, and each design is run 50 times. To sum up, we have

$$\begin{aligned} Design_{k} \equiv \{ s_{k}, r_{k}, v_{1,k} \}, k=1,2,..., 1000, \end{aligned}$$
(6)

where

$$\begin{aligned} s_{k} \sim U[0.26, 0.75], r_{k} \sim U[2,3,4,5,6], v_{1,k} \sim U[19,47]. \end{aligned}$$
(7)

The random design described above allows us to have enough observations to examine the effect of these two parameters on the emergent market performance.

3.2 Basic Results

Table 2 shows that the results for each design look like. Notice that we do not present all of them; otherwise, the table would be 1,000 rows long, since we have a total of 1,000 designs. Each row starts with parameters characterizing the design, namely, \(s, r, v_{1}, v_{2}\), and \(v_{3}\), followed by the key summary statistics of each design, including the mean price, trading volume, and volatility (standard deviation of the price) of each future. Since each design has been run 50 times, all these statistics are the averages taken over 50 runs. For the mean price, we first take the average of the price series for each run (Eq. 9), and take the average of the average over these 50 runs (Eq. 8).

Table 1. Tableau of control parameters
$$\begin{aligned} \bar{p}_{j} = \frac{\sum _{l=1}^{50}\bar{p}_{j,l}}{50}, \ \ j=1,2,3, \ \ l=1,2,...,50, \end{aligned}$$
(8)

where

$$\begin{aligned} \bar{p}_{j,l} = \frac{\sum _{t_{j,l} = 1}^{T_{j,l}} p_{j,l}(t_{j,l})}{T_{j,l}}, j = 1, 2, 3, \ \ l=1,2,...,50, \end{aligned}$$
(9)

and \(T_{j,l}\) are the transaction times of future \(j\) in the \(l\)th run.

These three figures, \(\bar{p}_{j}\) (\(j=1,2,3\)) are shown in the first three columns of the right panel of Table 2.Footnote 4 The next three columns, \(Vol_{j}\) (\(j=1,2,3\)) are the average of the trading volume over the 50 runs, and likewise for the price volatility.

$$\begin{aligned} \sigma _j = \frac{\sum _{i=1}^{50}\sigma _{j,l}}{50}, \ \ j = 1, 2, 3; \ \ l=1,2,..., 50, \end{aligned}$$
(12)

where \(\sigma _{j,l}\) is the standard deviation of the price of the \(j\)th future in the \(l\)th run. Table 2, therefore, provides us the basic input (the left panel) and output (the right panel) correspondence which allows us to address further the effect of the two key parameters, \(s\) and \(r\), on the prediction accuracy.

Based on Table 2, we shall start with a simple linear regression.

$$\begin{aligned} Y= f(s,r) + \epsilon = \beta _{0} +\beta _{1}s + \beta _{2}r + \epsilon . \end{aligned}$$
(13)
Table 2. Simulation input and output table

The dependent variable \(Y\) is the prediction accuracy based on the chosen error functions. In this paper, we shall use \(\bar{p}_{j}\) as the key predictor of \(v_{j}\) and consider the following four error measures frequently used in the literature.

  1. 1.

    Mean Absolute Percentage Error (MAPE)

    $$\begin{aligned} Y_{1}= MAPE = \frac{\sum _{j=1}^{m}\mid \bar{p}_{j} -v_{j} \mid /v_{j}}{m} \end{aligned}$$
    (14)
  2. 2.

    Root Mean Square Error (RMSE):

    $$\begin{aligned} Y_{2}= RMSE=\sqrt{\frac{\sum _{j=1}^{m} (\bar{p}_{j} -v_{i})^{2}}{m}} \end{aligned}$$
    (15)
  3. 3.

    Mean Square Error (MSE)

    $$\begin{aligned} Y_{3}= MSE = \frac{\sum _{j=1}^{m} (\bar{p}_{j} -v_{j})^{2}}{m} \end{aligned}$$
    (16)
  4. 4.

    Euclidian Distance (ED)

    $$\begin{aligned} Y_{4}= ED=\sqrt{\sum _{j=1}^{m}(\bar{p}_{j}-v_{j})^{2}} \end{aligned}$$
    (17)

The results of the prediction errors over these four error measures are provided in Table 3. Again, this is a simplified modification by only showing the first few and the last few rows. A complete table has 1,000 rows. This table then serves as the basis for running the linear regression (13).

The first regression result is shown in Table 4 (the upper panel). There we find that both \(s\) and \(r\) have a negative effect on the prediction accuracy, i.e., \(\beta _{1} >0\) and \(\beta _{2} > 0\), and the result is consistent regardless of the measure being employed. This result is somewhat counter intuitive, since one might initially have thought that increasing either the toleration capacity (\(s\)) or the exploration capacity (\(r\)) can make individual agents more informative, which in turn may help the information aggregation in the later stage. Nevertheless, this is not the case which we have here, but why? One possible explanation is that when both \(s\) and \(r\) become larger, depending on the \(v_{j}\), agents are not just better informed, but also more homogeneous in their expectations and reservation prices, which may cause transactions more difficult to happen and make the market less liquid. One such famous example is Tirole’s zero-trading theorem [8], i.e., in an extreme case where agents are all perfectly informed, there will be no trade in the market; in other words, the market can predict nothing at all in this situation.

3.3 Homogeneity Effect

To see this homogeneity effect, Fig. 5 shows the average trading volume under different vote shares with respect to these two capacities. Three features immediately stand out.

Table 3. Prediction accuracy
Fig. 5.
figure 5

Trading volume, exploration capacity, and toleration capacity. The five sub-diagrams in the left panel are drawn in the way by fixing the exploration capacity (\(r\)) and examining the effect of the toleration capacity (\(s\)) on the trading volume. To see the difference, different values of \(s\) are colored differently. The five sub-diagams in the right panel are drawn in the way by fixing the toleration capacity (\(s\)) and then examining the effect of the exploration capacity on the trading volume. Again, to see the difference, different values of \(r\) are colored differently.

First, there are hump-shaped curves in each sub-diagram with respect to a given exploration capacity (the left panel) or with respect to a given toleration capacity (the right panel) indicating that the trading volume increases when competition between the major political parties is keen, i.e., the share of the vote of the two major candidates is close.

Second, however, the hump-shaped curve has a tendency to shift down with the increase in each of the two capacities. Since the higher the capacities, the more homogeneous is the information received by the agent, the pattern of the shifting-down hump-shared curves indicates that the trading volume goes down with the degree of homogeneity.

Third, the curvature of the hump-shaped curve also decreases with the increase in the toleration capacity (the left panel) or the increase in the exploration capacity (the right panel). For example, when these capacities are higher, such as up to 70 % (for \(s\)) or up to 6 (for \(r\)), the hump is flattened out. This indicates that the effect of the uncertainty, measured by the closeness of the two major candidates in their share of the vote, no longer affects the trading volume when voters are homogeneously well-informed. This is not surprising: when voters are homogeneously well-informed, market uncertainty perceived by voters is reduced and hence even a neck-to-neck competition has little effect on the trading volume. To sum up, our analysis above shows that, in addition to the vote share or market uncertainty, the two capacities also affect the trading volume, and they affect it in a downward direction.

Fig. 6.
figure 6

Price volatility, exploration capacity, and toleration capacity. The five sub-diagrams in the left panel are drawn in the way by fixing the exploration capacity (\(r\)) and examining the effect of the toleration capacity (\(s\)) on the price volatility. To see the difference, different values of \(s\) are colored differently. The five sub-diagrams in the right panel are drawn in the way by fixing the toleration capacity (\(s\)) and then examining the effect of the exploration capacity on the price volatility. Again, to see the difference, different values of \(r\) are colored differently.

The same analysis is further carried out for the price volatility. Figure 6 shows the effect of the two capacities on the average price volatility (Eq. 12). Qualitatively speaking, the result is the same. All three features with regard to the effect of the two capacities remain for the case of the price volatilities. The trading volume (the thickness of the market) with the price volatility is the indicator of a functioning market where information is aggregated and revealed. However, when the degree of homogeneity of traders is high, these functions are adversely affected.

3.4 Conditional Regression

Given the homogeneity effect, it would be desirable to control some market characteristics while running the regression against \(s\) and \(r\). Therefore, we propose a second linear regression which takes into account the market characteristics. Two usual market characteristics considered in the literature are the trading volume (\(Vol\)) and the price volatility (\(\sigma \)). Following this convention, we propose the second linear regression (18).

$$\begin{aligned} Y= \beta _{0} +\beta _{1}s + \beta _{2}r + \sum _{i=3}^{5}\beta _{i} Vol_{i-2} + \sum _{i=6}^{8} \beta _{i} \sigma _{i-5}+ \epsilon , \end{aligned}$$
(18)

where \(Vol_{i}\) (\(i=1,2,3\)) is the trading volume of the \(i\)th futures, and \(\sigma _{i}\) (\(i=1,2,3\)) is the price volatility of the corresponding futures.

Since, as we have seen in Sect. 3.3, the trading volume and the price volatility have already been “polluted” by the two capacities (Figs. 5 and 6), in econometrics, this is what is familiarly known as an endogeneity problem. To take care of the endogeneity problem, what we do here is then, first, to run the two auxiliary regressions, one on the trading volume and one on the price volatility, against the two capacities, then, second, to take the residuals as the “cleaned” (filtered) trading volume and volatility. We then use them as independent variables in the market performance regression (18).

Table 4. Regression results with market characteristics

The regression results of regression (18) are shown in the lower panel of Table 4. The results show that the inclusion of the market characteristics can improve the coefficient of determination (\(R^{2}\)). This result is not difficult to understand. Given the geographical complexity and variability of the two-dimensional lattice, controlling both \(s\) and \(r\) does not automatically imply the control of the geographical and other resultant specifications on which the market performance also depends. It has already been shown in regression (13) that \(s\) and \(r\) can only have limited explanatory power. For most performance criteria, \(\bar{R}^{2}\) is not even up to 30 % (see Table 4, the upper panel). Therefore, once after incorporating these specificities through other variables, such as the trading volume and the price volatility, a large proportion of the unexplained behavior has now been incorporated (see the significant increase in \(\bar{R}^{2}\) from the lower panel of the same table). We find that after controlling the market characteristics the two capacities can indeed help enhance prediction accuracy. After incorporating the trading volume and the price volatility, \(\beta _{1}\) and \(\beta _{2}\) are both negative for all four accuracy criteria. In other words, conditional on the same trading volume and the price volatility, the higher the toleration capacity or the higher the exploration capacity, the better that the prediction market can predict.

4 Concluding Remarks

In this article, we address the issue of whether the better informed agent can help prediction markets in a spatial context. The better informed agents are characterized by their larger toleration capacity (sociability) and exploration capacity. The result is that under unconditional regression neither of them shows this enhancement, whereas, after controlling some market characteristics, the conditional regression shows their significance. Hence, in this sense, our paper shows that the quality of individuals does have a positive effect on information aggregation and on the formation of the wisdom of crowds.

The work can be extended in several directions. First, the network used here is a spatial network. In this digital age, given the significance of social groups in social media, it would be desirable to include a social network as part of the framework, and to study the effect of social network topologies. Second, the behavioral setting of the traders is very simple, i.e., the device of zero intelligence. It would be interesting to consider other behavioral settings involving cognition or learning, such as reinforcement learning or rule-based models. These extensions allow traders to base their decisions upon the information revealed in the order book. Third, the prediction market can be designed with other trading mechanisms, such as the call auction. It would be interesting to know whether these different trading mechanisms matter.