Keywords

1 Introduction

In data analysis and data mining processes, various models can be constructed depending on the goal of data processing. Frequently considered tasks concern:

  1. a)

    data clustering – aimed to explain group structure of data patterns,

  2. b)

    data visualization – in order to better understanding of (multidimensional) data sets,

  3. c)

    prediction of values or classes of certain (dependent) variable for new patterns,

  4. d)

    justification/explanation of such predictions.

These tasks (when performed on the same dataset for a given problem) are usually executed by different and unrelated tools (dedicated to specific, separate kinds of issues), what can make interpretation difficulties for a researcher or practitioner during data analyses. For example, such diverse techniques as:

  • k-means classifiers, hierarchical clustering methods, or Kohonen neural networks (Self Organizing Maps - SOM) [8] are intended exclusively to clustering problems [28]; some of them are also equipped with visualisation capabilities (i.e. dendrograms in hierarchical methods, or 2-dimensional groups presentation on a plane (map) in case of Kohonen SOMs).

  • all linear and nonlinear regression tools, and all machine learning regression techniques (e.g. perceptrons, GRNN networks and others) are aimed to solve prediction problems concerning forecasting/evaluation of real values of a dependent variable,

  • all pattern classifiers (like decision trees, k-nearest neighbour method, naive Bayes classifier, specific types of neural networks, etc.) are dedicated to assigning proper class (from a previously approved finite set of classes) to a new (multidimensional) pattern (vector of features).

In paper [16] Morajda and Paliwoda-Pękosz presented a concept of Self-Organizing Prediction Map (SOPM) that is a kind of a neural map based on standard Kohonen SOM network (see Sect. 2), and links (within one tool) all tasks a) b) c) d) listed above. The SOPM model is described here in Sect. 3.

The main goal of present article is introducing a concept of another neural map – an extension of SOPM, which apart from tasks a) b) c) d) generates the probability distributions for predictions of dependent variables for new patterns. This new model is named PDCM (from Probability Distribution generating and Clustering Maps). The idea of PDCM (as certain enhancement of SOPM) is presented in Sect. 4. Section 5 submits a research concerning application of proposed PDCM model in real estate data analysis.

2 Neural Maps – Literature Background

Kohonen neural network, particularly Self-Organizing Map (SOM) was proposed by T. Kohonen (see e.g. [8]) and has been accepted as a basic type of a neural map. In general SOMs are devoted to solve clustering problems (i.e. identification of groups of similar objects treated as multivariate vectors of features) with additional possibility of visualization of recognized groups in the plane (2-dimensional map). They perform the process of cluster analysis (patterns grouping) with the mapping of groups existing in a multidimensional feature space onto a two-dimensional map (rectangular structure of neurons in a plane), with maintaining topology of group distribution. Signals delivered by neurons placed in the rectangular map can then be analysed numerically and can be used for construction of graphs that visualize clusters arrangement.

Many publications report usefulness of these tools in various domains, e.g. in genetics (gene data clustering [20]); chemistry (antioxidants classification in biodiesel [7]); computer systems security (network intrusion detection [12]) and others.

A great many research works show usefulness of SOMs in management (e.g. in decision-support processes) and in business/economic data analysis. Such applications may particularly concern: business failure prediction [27], city infrastructure management [10], waste management [21], company performance analysis and clustering of companies [3], technological processes designing [25], analysis of social media with utilization in tourism businesses [9], generating of transaction strategies in financial markets [15], identification of bank risk profiles and failure prediction [24], detection of tax evasion [1], protected area management [5], water resources management [2, 22], corporate behaviours analysis [6], and others.

Numerous modifications of original SOM networks have been proposed in literature, let us show here only a small sample of various published postulates: a structure composed of many hierarchically (layered) SOM maps, named HSOM, was proposed in [19], the SOM model, in which the coordinates of neurons on the map are not constant, but are subject to dynamic changes during the learning process, was proposed in [14]; multilayer, hierarchical neural architectures based on Kohonen networks implementing clustering, used to recognize certain types of images were proposed in [11]. A good review of various variants of SOM has been presented by Moshou in monography [18].

3 Self-Organizing Prediction Maps (SOPM)

Original SOM model and its derivative tools, dedicated to clustering and its visualization, are usually not applicable in prediction problems. In turn, prediction neural networks are not equipped with explanation and/or visualization capabilities. A certain solution to these problems is the concept of Self-Organizing Prediction Map (SOPM) – as a modification of SOMs – proposed by Morajda and Paliwoda-Pękosz in paper [16] (certain modifications of SOPM, called FLOPM, has also been submitted by the same authors in [17]). SOPMs enable:

  • clustering of available (used for model training) patterns (features vectors) with special respect to a selected feature of special meaning (denoted here by χ)

  • visualization of clustering results in the special map,

  • making predictions of the special feature χ for new patterns, together with numerical and visual analysis of these predictions.

The basic assumption for SOPM is that a selected variable χ (one of features xi describing each pattern included into a dataset undergoing analysis) has a special (key) meaning in a data mining process or is accepted as a dependent variable in prediction task. In clustering of patterns from a given dataset, the research inquiry can involve recognition and visualization of clusters’ arrangement with separate, particular consideration of variable χ. Consequently in SOPM models, the idea of modification (in relation to SOM) of the projection between the multidimensional feature space (the analysed patterns are positioned in) and set of neurons placed in a rectangular XY map, relies on following rules (see Fig. 1):

  1. a)

    the key variable χ is projected only on the coordinate Y,

  2. b)

    other variables (features) xj (j = 1, 2, …, n; xj ≠ χ) are mapped only on the coordinate X.

The projection is realized by a special training algorithm using analysed dataset.

Fig. 1.
figure 1

Scheme of mapping of a multidimensional feature space onto the rectangular layer of neurons in SOPM model (small circles represent positions of neurons in the map). Source: [16]

If the variable χ is qualitative and expressed on the ordinal scale with finite set of ordered values, then the projection χ onto Y according to the rule a) is simple: subsequent rows in the SOPM map represent subsequent ordered values of χ (number of rows is equal to number of χ values).

Let us now assume that variable χ is continuous, i.e. takes real values from certain range D ⊂ ℜ. Let N denotes the number of rows of neurons in SOPM. The projection χ onto Y is then executed as follows:

  • all values of χ from training dataset are sorted into an ascending sequence, which then is cut into N equally numerous subsequences,

  • each subsequence determines certain range Ri (i = 1, 2,…, N) of χ values, so that for any i each value from Ri is not greater than any value from Ri+1, and R1R2 ∪… ∪ RN = D,

  • ranges Ri (i = 1, 2,…, N) are assigned to subsequent rows of SOPM, in turn these rows have numerical coordinates yi (i = 1, 2,…, N) on Y axis, where yi is the centre of Ri,

  • consequently each value of variable χ can be assigned to a certain row (range Ri) and finally projected to respective value yi.

The proposed training algorithm of SOPM is a simple modification of well-known SOM training procedure (see e.g. [8]), adjusted to the above mentioned concept of SOPM data mapping as follows:

  • Step 1. Consider a set {(xp, χp), p = 1, 2, …, J} as a training dataset (representing analysed phenomenon), where xp is a vector of features (variables) ≠ χp

  • Step 2. p ← 1

  • Step 3. Deliver xp to the SOPM input; find the row of neurons corresponding to χp

  • Step 4. In the selected row find a neuron generating the lowest signal (as a distance between xi and neuron’s weight vector w) and approve it as a winning neuron np

  • Step 5. In the SOPM output map determine the neighbourhood for (around) np

  • Step 6. Adjust weights w (by adding Δw to w) for all neurons from this neighbourhood according to the rule:

    $$ \Delta {\bf{w}} = \eta \cdot s\left( {n_m } \right) \cdot ({\bf{x}}_p - {\bf{w}}) $$
    (1)

    where η is a learning coefficient (0 < η < 1), and s(nm), where 0 < s(nm) ≤ 1 and s(np) = 1, is the value of neighbourhood function for a being trained neuron nm belonging to the determined neighbourhood of np

  • Step 7. pp + 1; if pJ go to step 3, otherwise go to step 8

  • Step 8. If the end-of-training condition is not fulfilled go to step 2, otherwise stop.

It should be noted that main and crucial difference between SOPM training algorithm and the classical training procedure for Kohonen’s SOM relies on constraint of selection the winning neuron (and then its neighbourhood) from strictly selected row of map neurons, i.e. the row corresponding to the current value of χp. It should also be noted that a classic SOM training algorithm is fully unsupervised, but in SOPM this procedure is mixed: supervised as regards the key variable χ, and unsupervised with respect to all other features (see [16] for details).

Fig. 2.
figure 2

Interpretation of SOPM maps (small circles represent particular neurons distributed in a rectangular map). a) Hypothetical effect of clustering process (indicated groups of neurons represent multidimensional clusters of patterns). b) Utilisation of a trained model in prediction of χ variable for a new pattern (explanation in text).

After completing the training algorithm, each pattern from the analysed dataset is assigned to a certain neuron in the output map of SOPM – it is the finally winning neuron (see Step 4) for the given pattern. Consequently, particular neurons “collect” assigned patterns, then groups of such neighbouring neurons in the map, having large “collections” of patterns, represent corresponding clusters of objects placed in multidimensional feature space (see hypothetical example in Fig. 2a). However, in case of SOPMs, such a clustering is executed also with respect to special variable χ projected onto a separate (vertical) axis in the map of neurons.

Apart from such special clustering and its visualization, a trained SOPM can also be utilised for prediction of χ variable for new patterns (see example in Fig. 2b)). After delivering input vector x of a new pattern to the model input, a winning neuron (i.e. generating the lowest signal) out of the whole map is being found (black circle in Fig. 2b)). The coordinate yp of the row it belongs to (dotted line in Fig. 2b)) is a predicted value of χ for this pattern. Moreover, if a winning neuron belongs to a certain previously identified group, the considered pattern can be assigned to the corresponding cluster of objects. It delivers additional information that better explains the prediction result. Also, further fine-tuned prediction process applied only to the identified cluster (with use of other methods) is possible.

Additionally, numerical (or graphical) analysis of signals from a neighbourhood of winning neuron can (informally) show uncertainty of the prediction: if the winning neuron is distinctly identified then the uncertainty is lower, however if there are many neighbouring neurons (belonging to many rows) generating similar signals – the uncertainty of the prediction is higher.

4 Proposition of PDCM Neural Map as an Extended Version of SOPM, Enabling Probabilistic Prediction

This section presents the concept of modification (expansion) of the SOPM method, named PDCM, which (preserving all SOPM capabilities) allows additionally generation of a posteriori probability distribution (in the Bayesian sense) for the value of the predicted variable χ.

4.1 Approximation of the Probability Distribution by Machine Learning Models

Let us consider any machine learning model designed to solve the classification problem and trained:

  • by minimisation of SSE (sum of squared errors):

    $$ SSE = \frac{1}{2}\sum_p {\sum_i {(\theta_i^{(p)} - y_i^{(p)} )^2 } } $$
    (2)

where: yi(p) – signal of i-th output neuron for p-th training pattern,

  θi(p) – desired training value of i-th output neuron (related to i-th class) for p-th learning pattern.

  • using training output (desirable) values as 1 an 0 as follows:

    $$ \theta_i^{(p)} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\,i\,{\text{indicates}}\,{\text{correct}}\,{\text{class}}\,{\text{for}}\,p{\text{th}}\,{\text{pattern}}} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right. $$
    (3)

Ruck et al. in [26] showed that a multilayer perceptron (or other supervised machine learning model - there are no formal limitations to its structure) designed to solve the classification problem and trained according to postulates (2) and (3), approximates the Bayesian optimal discriminant function. Moreover it was shown that the output signals yi of such a model approximate (in the sense of minimisation of SSE) the Bayesian a posteriori probabilities of belonging of the input vector x to particular classes, i.e.:

$$ y_i ({\bf{x}}) \approx P(\omega_i |{\bf{x}})\quad {\text{for}}\,{\text{outputs}}\,({\text{classes}})\quad i = 1,2, \ldots ,N $$
(4)

where ωi denotes i-th class.

If a predicted output variable χ is continuous, i.e. takes real values, a given class ωi is related to a specific range Ri of the variable χ (see Sect. 3). Then the set of all outputs yi (i = 1, 2,…, N) can be used to approximate the entire conditional Bayesian probability density distribution (a posteriori) for the predicted variable χ (under the condition that the vector x has appeared). However, in order to approximate the probability distribution for χ, it is necessary to scale the yi signals linearly. The scaling factor depends on the length of the range D (D is the set of all χ values, D = R1R2 ∪… ∪ RN) and on the value N; this scaling should ensure that the area under the probability distribution graph (i.e. total probability) is equal to 1. This condition can be written as:

$$ \mathop {\lim }\limits_{N \to \infty } \sum_{i = 1}^N {(\lambda \cdot y_i )} \frac{{\chi_{\max } - \chi_{\min } }}{N} = 1, $$
(5)

where: λ – scaling factor (multiplier of yi signals),

  χmin, χmax – limits of range D.

As \(\sum\limits_{i = 1}^N {P(\omega_i |{\bf{x}})} = 1\), and consequently \(\sum\limits_{i = 1}^N {y_i } = 1 \) (with exact approximation (4)), condition (5) becomes fulfilled (independently from value N) for scaling factor:

$$ \lambda = \frac{N}{{\chi_{\max } - \chi_{\min } }}. $$
(6)

It should be noted that despite theoretical considerations, in machine learning practice many elements can influence accuracy of approximation (4). For example model architecture, training parameters, selection of patterns in training set or (here) accepted number N of ranges Ri may have significant impact on this accuracy.

4.2 Main Assumptions of the PDCM Model as a Modified SOPM Network

In order to adopt the SOPM model to additional task of creating probability distribution for undergoing prediction variable χ, the structure of the model should be expanded with a new layer of nodes that generate output signals yi (i = 1, 2,…, N) approximating probabilities according to (4), see Fig. 3. During the training stage, desirable signals for these nodes are 0 or 1 according to (3). Each node aggregates signals from map neurons belonging to a certain map row representing given range Ri of the variable χ. There are no weights assigned to connections between middle layer and output layer (such connections are shown – for clarity reasons – only for the first and last rows in Fig. 3).

Fig. 3.
figure 3

Postulated structure of PDCM network.

Firstly, let us consider signals generating by neurons of middle layer (map). So far, in SOPM, this signals represented (for a given input vector x) values of d = ║xw║, i.e. distance between x and neuron weight vector w. Now let us introduce an exponential activation function for these neurons, so that they generate signals:

$$ y_{ik} = {\text{e}}^{-d} = {\text{e}}^{-\parallel {\bf{x}}-{\bf{w}}\parallel } $$
(7)

where i, k are coordinates of the neuron in the map.

In PDCM model, signals (7) from all neurons belonging to a given i-th map row (corresponding to a range Ri, i = 1, 2,…, N) are aggregated by an output node (see Fig. 3) to an output signal yi according to the postulated formula:

$$ y_i = \sqrt[K]{{y_{i1} \cdot y_{i2} \cdot \ldots \cdot y_{iK} }} = \left( {y_{i1} \cdot y_{i2} \cdot \ldots \cdot y_{iK} } \right)^\frac{1}{K} $$
(8)

where K is the number of neurons in the row (horizontal size of the map).

4.3 Proposed Training Procedure of PDCM

As in the SOPM model, the training of PDCM network is dual: supervised due to the key variable χ (which is dependent variable in estimation and forecasting problems), and unsupervised due to other variables. This approach enables (as in SOPMs) the implementation of the clustering process (cluster analysis) in the feature map along with its visualization, and solving the problem of estimating (predicting) the variable χ. However, while in the case of SOPM, the supervision relies only on indicating the appropriate row of the map (corresponding to the value of χ for a considered pattern x), in the PDCM model, similarly to classic supervised neural networks, desired output signals θ (according to rule (3)) must be given for all nodes of output layer.

Let us note that due to specific character of connections between middle layer and output layer of PDCM (Fig. 3), and assuming (at first) no relations between rows (no neighbourhood at all) in the map, it is possible to consider separately N parts of the whole model (each containing one – i-th – row of the map and one corresponding output node); let us name such i-th part of the model by PDCMi (i = 1, 2,…, N), (the neighbourhood aspect during the training stage of whole PDCM model will be considered later in Subsect. 4.4).

Below, a single step of PDCMi training for p-th training pattern xp is presented (indexes i and p are then omitted due to better clarity). The minimized error function (2), denoted here by E, is now given by formula:

$$ E({\bf{W}}) = \frac{1}{2}(\theta - y)^2 = \frac{1}{2}\delta^2 $$
(9)

where: δ = θ − y denotes output error value for a given (p-th) pattern

W denotes the vector of all weights of PDCMi.

The training procedure (based on classic backpropagation algorithm) aim to minimize function (9) using desired output signals θ determined according to rule (3).

Let wk (k = 1, 2, …, K) denotes weight vector of k-th neuron of PDCMi – on the whole PDCM map it is the neuron having coordinates (i, k). A formula describing one step of adjusting weights wk (by a correction vector Δwk), based on steepest gradient method (used also in classic backpropagation algorithm), is given by equation:

$$ \Delta {\bf{w}}_k = \eta \cdot (-\nabla E\left( {{\bf{w}}_k } \right))\quad \left( {k = {1},{2}, \ldots ,K} \right) $$
(10)

where: ∇E(wk) is a part (relating to wk) of gradient of error function (2) in point wk,

η is a value of training coefficient, 0 < η < 1.

Now the problem of specifying the training algorithm for the PDCM network (as a modification of the error backpropagation algorithm in the version with independent weight correction for each training pattern) relies on finding the vectors ∇E(wk). As:

$$ \nabla E({\bf{w}}_k ) = \frac{{\partial E({\bf{W}})}}{{\partial {\bf{w}}_k }} = \frac{{\partial E({\bf{W}})}}{\partial y} \cdot \frac{\partial y}{{\partial y_k }} \cdot \frac{\partial y_k }{{\partial {\bf{w}}_k }} $$
(11)

then, considering dependencies (7), (8) and (9) we obtain (note that for clarity reasons the index i has been omitted everywhere, particularly for y, yk and wk):

$$ \nabla E(w_k ) = - \delta \frac{1}{K}(y_1 \cdot y_2 \cdot \ldots \cdot y_K )^{\frac{1}{K} - 1} \cdot \frac{y_1 \cdot y_2 \cdot \ldots \cdot y_K }{{y_k }} \cdot e^{ - d} \cdot \left( { - \frac{\partial d}{{\partial {\bf{w}}_k }}} \right) $$
(12)

and, after simplification, taking again into consideration (7) and (8):

$$ \nabla E({\bf{w}}_k ) = \frac{1}{K}\delta \cdot y \cdot \left( {\frac{\partial d}{{\partial {\bf{w}}_k }}} \right). $$
(13)

Assuming the Euclidean metric to determine the distance in the weights space, the distance d = ║xwk║ is expressed by:

$$ d = \sqrt {\sum_j {(x_j - w_{jk} )^2 } } $$
(14)

where j is the index for all subsequent elements of vectors x and wk.

Now, assuming that d ≠ 0, we obtain

$$ \frac{\partial d}{{\partial w_{jk} }} = \frac{1}{2d}2 \cdot (x_j - w_{jk} ) \cdot ( - 1) $$
(15)

and then, after applying it in (13), the gradient is determined as

$$ \nabla E(w_k ) = - \frac{1}{K}\delta \cdot y \cdot \frac{{{\bf{x}} - {\bf{w}}_k }}{{\parallel {\bf{x}} - {\bf{w}}_k \parallel }} $$
(16)

Finally, considering (10) and (16), the one-step weight correction vector Δwk, is determined as:

$$ \Delta {\bf{w}}_k = \frac{1}{K}\eta \cdot \delta \cdot y \cdot \frac{{{\bf{x}} - {\bf{w}}_k }}{{\parallel {\bf{x}} - {\bf{w}}_k \parallel }}\quad \quad (k = 1,2, \ldots ,K). $$
(17)

It should be noted that the last factor in Eq. (17) represents a unit-length vector directed from the point wk towards the point x. The direction determined in this way is the direction of the entire weight correction vector Δwk (anyway its orientation and length may vary and depend on other factors in (17)).

Such training steps are repeated for all training patterns from a considered dataset.

4.4 Generalized Training Procedure Taking into Account Neighbourhood Aspects

Following the methodology of training SOM and SOPM networks, let us now consider – for the PDCM network – the possibility of introducing the idea of neighbourhood and the related principle of similar method of training for topologically adjacent map neurons in middle layer (mapping adjacent areas of the feature space).

For a single PDCMi network (i = 1, 2,…, N) the neighbourhood concept involves the requirement to differentiate the lengths of weight correction vectors Δwk for k = 1, 2, …, K (both when the considered PDCMi contains a winning neuron and when it does not). Then the training rule (17) should be modified as follows:

$$ \Delta {\bf{w}}_k = a_k \frac{1}{K}\eta \cdot \delta \cdot y \cdot \frac{{{\bf{x}} - {\bf{w}}_k }}{{\parallel {\bf{x}} - {\bf{w}}_k \parallel }}\quad \left( {k = {1},{2}, \ldots ,K} \right), $$
(18)

where ak are neighbourhood coefficients (values of a neighbourhood function s) responsible for differentiating lengths of vectors Δwk.

Certain theoretical analyses executed by author has led to the conclusion that for a single PDCMi network the dependency:

$$ \sum_{k = 1}^K {a_k } = K $$
(19)

should be ensured.

Considering now the neighbourhood aspect for the whole PDCM during the training stage (i.e. the essential aspect for ensuring a proper organization of the map – PDCM middle layer – in order to perform clustering process), there is a need for introducing a certain neighbourhood function s, like in models SOM and SOPM. Here, the function s should be responsible for determining the neighbourhood coefficients ak for all neurons in the map, during a given training step. The „centre” of function s is always the winning neuron, selected separately for each training pattern exactly according to rules accepted in SOPM (see Sect. 3). However in PDCM, the neighbourhood function s should additionally take into account the rule (3) and Eq. (19). Optimal selection of function s is the matter of on-going experiments, current results of such exploratory analyses were implemented by author in researches shown in next section.

The definition of learning rule (formula (18)) for PDCM network, supplemented by approving a method of determining the neighbourhood coefficients ensuring the implementation of the pattern grouping process, allows creation the network training algorithm. The algorithm has been implemented in form a computer program written by the author in C++, which is the basis for the research discussed in the next section.

5 Application of the PDCM Model in Real Estate Market Analysis

Below, results of application of the PDCM model in the issue of real estate value estimation are presented. A Boston housing dataset, available in the UCI ML Repository (https://archive.ics.uci.edu/ml/machine-learning-databases/housing/), has been used in presented research. This set has been often exploited in many researches concerning clustering or regression problems (see e.g. [4, 13, 23]).

The Boston dataset contains 506 patterns described by 14 numeric variables. The dependent feature χ represents the median of the estate prices in the given census area (MEDV). The remaining 13 variables describe certain features influencing real estate values and, after standardisation, constitute the input vector to PDCM. Six observations out of 506 (their numbers in the original Boston dataset are: 48, 137, 197, 314, 435, 457) were selected randomly for ultimate testing (creating the test set); the remaining 500 patterns create the training set.

The following PDCM model parameters were adopted:

  • number of training epochs (presentations of whole training set): 300,

  • PDCM map dimensions: X axis – 10 (K = 10), Y axis – 20 (N = 20),

  • each range R1, R2,…, RN contains 25 values of χ, taken from the training set,

  • the rule (18) has been adopted for model training,

  • training coefficient η has decreased linearly during the training from 0.7 to 0.07

  • initial weights for map neurons were selected randomly from range [−1.5, 1.5].

The effect of clustering, expressed by the numbers of training patterns x assigned to particular neurons of the map (middle layer) of the PDCM network, is presented in tabular and graphical form in Fig. 4 (note also a relation to Fig. 2a)).

Fig. 4.
figure 4

Clustering results of the Boston real estate dataset

Analysis of clustering results (Fig. 4) shows the tendency to create clusters of training patterns assigned to the topologically adjacent PDCM map neurons. This method also allows (like SOPM) the identification of clusters with respect to values ​​of a particular feature (variable χ – here median of the estate prices MEDV).

The PDCM method also (like SOPM) allows, for a given new pattern, determining a prediction value of variable χ (along with a visual or numerical informal assessment of prediction uncertainty), based on identification of winning neuron (and signal analysis of adjacent neurons). However in PDCM (contrary to SOPM) it is additionally possible to generate (approximate) formal a posteriori probability density distribution for the predicted (estimated) variable χ – this property is a key functional feature of this model. The results of testing the PDCM network in the real estate valuation process for six test cases are presented below.

Figure 5 shows graphically output signals generated according to formula (7) by neurons of the PDCM map (middle layer) in response to selected (exemplary) test patterns 1 and 3. The darker area in the graph, the stronger neuron’s signal. The winning neuron (generating the highest signal – see formula (7), and indicating the point prediction of χ for a given pattern) is placed in the black area. For the test pattern 1 the prediction value is 18.7, for the test pattern 3 the prediction is 32.9.

Fig. 5.
figure 5

Output signals of map neurons for two selected test pattern 1 and 3

The testing effects (based on point predictions generated by winning – i.e. generating the strongest signal – neuron) for all six testing patterns are presented in Table 1. The mean absolute error in estimating the MEDV value for the test set is 1.55, what (for this specific dataset) should be appointed as a good result.

Table 1. Testing results (based on point predictions) for all six testing patterns

Figure 6 show the graphs of estimated probability density distributions for MEDV for six test patterns. These distributions have been approximated on the basis of signals generated by output nodes (output layer) of the PDCM model, according to formula (4). Black triangular mark on the horizontal axis shows the actual value of the MEDV.

Fig. 6.
figure 6

Approximated probability density distributions for MEDV for six test patterns

It should be noted that apart from above mentioned possibilities of data analyses performed by proposed PDCM model, next data exploration options, based on further investigation in signals of all map neurons, are delivered by this method. For example, for the test pattern 3, a fairly large area of strong signals can be identified at the top of PDCM map (see Fig. 5); this fact is also confirmed by the shape of the probability density function (Fig. 6, pattern 3). The neuron generating the strongest signal in this area (the winning neuron) with coordinates: X = 10, Y = 18 (column = 10, MEDV = 32.9) collects 10 training patterns assigned to it (see table in Fig. 4, bold number 10). These 10 cases may constitute (for a real estate market analyst) a comparative base helpful in additional justification of the estate price estimation.

6 Conclusions

The paper submits a concept of new neural tool PDCM, dedicated to relatively wide range of data analysis and data exploration tasks, i.e. special clustering, clusters’ visualization, dependent variable prediction (together with its possible visual analysis and justification) and probabilistic prediction on the basis of approximations of a posteriori probability density distribution. Basic theoretical considerations concerning the proposed model have been shown. Also, the presented analyses of application of the PDCM model for the real estate market data indicate the significant effectiveness of this tool and quite rich possibilities of this method in data mining.

However, much future research should yet be done; especially desirable analyses should concern selection of: PDCM parameters (e.g. map dimensions), coefficients used in training algorithm and shape of neighbourhood function (neighbourhood coefficients). Also, testing the model with use of other datasets will be beneficial.