Keywords

1 Introduction

Intuition refers to knowledge acquired without inference and/or the use of reason [25]. Philosophically, there are several definitions for intuition and the most popularly used one is “Thoughts that are reached with little apparent effort, and typically without conscious awareness” [11] and is considered as the opposite of a rational process. From a machine learning perspective, training a supervised classifier is a rational process where it is trained with labeled data allowing it to learn a decision boundary. Also, traditional unsupervised learning methods do not map the learnt patterns to their corresponding class labels. Semi-supervised approaches bridge this gap by leveraging unlabeled data to better perform supervised learning tasks. However, the final task (say, classification) is performed only by a supervised classifier using labeled data with some additional knowledge from unsupervised learning. The notion of intuition would mean that the system performs tasks using only unlabeled data without any supervised (rational) learning. In other words, intuition is a context-dependent guesswork that can be incorrect at certain times. In a typical learning pipeline, the concept of intuition can be used for a variety of purposes starting from training data selection up to and including decision-making. Heuristics are the simplest form of intuition that bypass or is used in conjunction with rational decisions to obtain quick approximate results. For example, heuristics can be used in (1) choosing the new data points in an online active learning scenario [6], (2) for feature representation [7], (3) feature selection [10], or (4) choice of classifier and its parameters [4].

Table 1 Comparison of existing popular machine learning paradigms along with the proposed intuition learning paradigm

Table 1 shows the comparison of existing popular machine learning paradigms. Supervised learning attempts to learn an input–output mapping function on a feature space using a set of labeled training data. Transfer learning aims to improve the target learning function using the knowledge in source (related) domain and source learning tasks [22]. Many types of knowledge transfer such as classification parameters [17], feature representations [9], and training instances [12] have been tested to improve the performance of supervised learning tasks. Semi-supervised learning utilizes additional knowledge from unlabeled data, drawn from the same distribution and having the same task labels as the labeled data. Many of the research works have focused on unsupervised feature learning, i.e., to create a feature subspace using the unlabeled data, to which the labeled data can be projected to obtain a new feature representation [5]. In 2007, Raina et al. [23] proposed a framework termed as “Self-taught learning” to create the generic feature subspace using sparse autoencoders irrespective of the task labels. Self-taught learning dismisses the same class label assumption of semi-supervised learning and forms a generic high-level feature subspace from the unlabeled data, where the labeled data can be projected.

Fig. 1
figure 1

Comparing the different learning paradigms such as supervised, semi-supervised, and transfer learning with the proposed intuition learning paradigm. Intuition learning transfer the knowledge to perform classification from unlabeled data using reinforcement learning

As shown in Fig. 1, we postulate a framework of supplementing intuition decisions at the decision level to a supervised or semi-supervised classifier. The decisions drawn by the reinforcement learning block in Fig. 1 are called intuition because they are learnt only using the unlabeled data with an indirect reward from a teacher. Existing algorithms, broadly, require training labels for building a classifier or borrows the classifier parameters from an already trained classifier. Direct or indirect training is not always possible as obtaining data labels are very costly. To address this challenge, we propose a novel paradigm for unsupervised task performance mechanism learnt from cumulative experience. Intuition is modeled as a learning framework, which provides the ability to learn a task completely from unlabeled data. By using continuous state reinforcement learning as a classifier, the framework learns to perform the classification task without the need for explicit labeled data. Reinforcement learning helps in adapting a randomly initialized feature space to the specific task at hand, where a parallel supervised classifier is used as a teacher. As the proposed framework is able to learn a mapping function from the input data to the output class labels, without the requirement for explicit training, it functions similar to human intuition and we term this approach as Intuition Learning.

1.1 Research Contributions

This research proposes a novel intuition learning framework to enable algorithms learn a specific classification or regression task completely from unlabeled data. The major contributions of this research are as follows:

  • A continuous state reinforcement learning-based classification framework is proposed to map input data to output class label, without the explicit use of training.

  • A residual Q-learning-based function approximation method for learning the feature representation of task-specific data. A novel reward function which does not require class labels is designed to provide feedback to the reinforcement-based classification system.

  • A context-dependent addition framework is proposed, where the result of the intuition framework can be supplemented based on the confidence of the trained supervised or semi-supervised mapping function.

Fig. 2
figure 2

A block diagram outlining on how a feature space can be adapted using reinforcement learning algorithm with feedback from a supervised classifier trained on limited task-specific data

2 An Intuition Learning Algorithm

The basic idea of the proposed intuition learning framework is presented in Fig. 2. Given a large set of unlabelled data, different kinds of feature representations are extracted to describe the data, irrespective of the task in hand. To further leverage the knowledge interpretation from unlabeled data, a continuous state reinforcement learning mechanism is used to perform the given classification task. As reinforcement is a continuous learning process, using a reward-based feedback mechanism, the classification task improves with time. The reinforcement learning, on one hand acts as a classifier, while on the other hand continuously adapts the feature representation with respect to the given task. Thus, given multiple tasks, the proposed intuition learning framework can adapt the generic feature space to be consistent with the corresponding task.

Let \(\{(I_l^{(1)}, y^{(1)}), (I_l^{(2)}, y^{(2)}), \ldots , (I_l^{(m)}, y^{(m)})\}\) be the set of m labeled training data drawn i.i.d. from a distribution D. The labeled data are represented as \(\{(x_l^{(1)},y^{(1)}), (x_l^{(2)},y^{(2)}), \ldots , (x_l^{(m)},y^{(m)})\}\), where \(x_l^{(i)} \in {R}^n\) is the feature representation of the data \(I_l^{(i)}\) and \(y^{(i)} \in [1, 2,\ldots , C]\) denotes the class label corresponding to \(x_l^{(i)}\). Let the set of unlabeled data be \(\{I_u^{(1)}, I_u^{(2)}, \ldots , I_u^{(p)}\}\), where the subscript u represents that they are unlabeled data. Contrary to self-taught learning [23], we do not assume that the labeled and unlabeled data should be drawn from the same distribution D or have the same class labels, however, they should be derived from the same modality. Given a set of labeled and large unlabeled data, the aim of intuition learning is to learn a hypothesis \(h': \left( X\rightarrow R \right) \in [1, 2,\ldots , C]\) that predicts the labels for a given input representation of data drawn. However, the hypothesis \(h'\) is learnt without the direct use of labels \(y^{(i)}\) and is used as a supplement for the hypothesis h learnt using \((x_l^{(1)},y^{(1)})\) in a supervised (or semi-supervised) manner.

2.1 Adapting Feature Representation

From a large set of unlabeled data, many different kind of feature representations are extracted. Each representation may correspond to a different property of the data that we try to capture. For image data, the features could be color, texture, and shape while for text data, the features could be n-grams, bag-of-words, and word embeddings. The features can also be a set of different color features or set of hierarchical n-grams. If the large set of unlabeled data is seen as the world (or the universal set), the features are the different observations made by the algorithm from the world. Similar to human intuition, the set of feature representations extracted are task-independent, and later depending on the learning task a subset of these features could be dominantly used. This task-independent feature space is similar to the human intuition learnt by observing the environment.

Fig. 3
figure 3

Overall scheme of the proposed intuition learning algorithm that aids a supervised classifier

Figure 3 provides a detailed description of the proposed intuition learning framework. From the set of unlabeled data \(I_u\), we extract r different kinds of feature representations, \(\{X_{u_1}, X_{u_2}, \ldots , X_{u_r}\}\), where \(X_{u_i}\) = \(\{x_{u_i}^{(1)}, x_{u_i}^{(2)}, \ldots , x_{u_i}^{(p)}\}\), where \(x_{u_{i}}^{(j)} \in {R}^{n_i}\). For every feature representation \(q \in [1, 2, \ldots , r]\), we cluster the representation \([x_{u_q}^{(1)}, x_{u_q}^{(2)}, \ldots , x_{u_q}^{(p)}]\) into C clustersFootnote 1 using k-means clustering. The centroid of each cluster for the ith feature representation is given as \([z_{u_(i)}^{1}, z_{u_(i)}^{2}, \ldots , z_{u_(i)}^{C}]\). This feature collection of \([z_{u_(q)}^{1}, z_{u_(q)}^{2}, \ldots , z_{u_(q)}^{C}]\), for \(q = [1, 2, \ldots , r]\) is called as Intuition-based Feature Subspace (IFS), as it clusters the entire set of unlabeled data into groups, based on every observation (feature).

2.2 Classification Using Reinforcement Learning

For a given set of m labeled training data, \(\{I_l^{(1)}, I_l^{(2)}, \ldots , I_l^{(m)}\}\), the set of r features (as used for the unlabeled data) are extracted as \([x_{l_q}^{(1)}, x_{l_q}^{(2)}, \ldots , x_{l_q}^{(m)}]\), where \(q = [1, 2, \ldots , r]\). The extracted features are then projected onto the Intuition-based Feature Subspace (IFS) by calculating the distance of features from the corresponding cluster centroids shown as,

$$\begin{aligned} s_{q}^{(i)} = ||x_{l_q}^{(i)} - z_{u_q}^{(j)}||_{2} \end{aligned}$$
(1)

for \(j = [1, 2,\ldots , C], q = [1, 2,\ldots , r],\) and \(i = [1, 2,\ldots , m]\). The representation of the data i is given by concatenating the distances corresponding to all the features,

$$\begin{aligned} s^{(i)} = [s_{1}^{(i)}, s_{2}^{(i)},\ldots , s_{r}^{(i)}] \end{aligned}$$
(2)

The obtained representation is succinct with a fixed length dimension of \(rC\times 1\), where r is the number of different feature types extracted and C is the number of clusters. In essence, every value represents the distance from a cluster centroid. Also, in a typical semi-supervised (or self-taught) learning scheme, the mapping between intuition based representation and the output class labels, \(\{(s^{(1)},y^{(1)}), (s^{(2)},y^{(2)}), \ldots , (s^{(m)},y^{(m)})\}\) is learnt in a supervised manner. However, in the proposed intuition learning, we attempt to learn the data-label mapping without using the class labels, using reinforcement learning. The aim of reinforcement learning is to learn an action policy \(\pi : s \rightarrow a\), where \(s \in S\) is the current state of the system and a is the action performed in that state. As the setup involves a continuous state environment, the optimal action policy is learnt using a model free, off-policy Temporal Difference (TD) algorithm called Q-learning, where Q(sa)-value denotes the effectiveness of a state-action pair. The TD(0) Q-learning algorithm is given by,

$$\begin{aligned} Q(s_{t},a) = Q(s_{t},a) + \alpha \left[ r_{t} + \gamma \max _{a'}Q(s_{t+1},a') - Q(s_{t},a)\right] \end{aligned}$$
(3)

where \(r_{t} \in R^n\) is the immediate reward obtained for performing action a in state \(s_t\), \(\gamma \in [0,1]\) is the factor with which the future rewards are discounted and \(\alpha \in [0,1]\) is the learning rate. In our problem, reinforcement learning is formulated as a classification problem, where IFS is the current state s and action a is the output label to be predicted, the policy \(\pi \) learns the data-label relation for the given data. Due to the large, probabilistic, and continuous definition of the space s, the Q-values are approximated using a universal function approximation, i.e., a neural network [26].

$$\begin{aligned} Q(s,a) = \psi (s,a,\theta ) = \sum _i \phi _i(s,a).\theta _i = \phi ^{T}(s,a).\theta \end{aligned}$$
(4)

where \(\phi \) is the approximation function. Using residual Q-learning algorithm [21], the free parameters \(\theta \) are updated as follows:

$$\begin{aligned} \theta _{t+1} = \theta _t + \alpha .\psi .\varDelta \psi \end{aligned}$$
(5)
$$\begin{aligned} \begin{aligned} \theta _{t+1} = \theta _t + \alpha \left[ r_{t} + \gamma \max _{a'}Q(s_{t+1},a') - Q(s_{t},a)\right]&\\ \times \left[ \beta \gamma \frac{\partial }{\partial \theta }\max _{a'}Q(s_{t+1},a')-\frac{\partial }{\partial \theta }Q(s_{t},a) \right] \end{aligned} \end{aligned}$$
(6)

where \(\beta \) is a weighting factor called the Bellman residual. Baird [1] guaranteed the convergence of the above approximate Q-learning function, the details of which are skipped for the sake of brevity. \(\epsilon -\) exploration strategy is adopted, where, in every state a random action is preferred with a probability of \(\epsilon \). As observed in [16], “the crucial factor for a successful approximate algorithm is the choice of the parametric approximation architecture and the choice of the projection (parameter adjustment) method(s)”. The choice of reward function employed is highly important and directly implies the effectiveness of adaption, which is explained in the next section.

2.3 Design of Reward Function

The Intuition-based Feature Subspace (IFS) is defined by the cluster centroid points obtained using unlabeled data for every feature q as \([z_{u_q}^{(1)}, z_{u_q}^{(2)}, \ldots , z_{u_q}^{(C)}]\), where \(q = [1, 2, \ldots , r]\). This space provides an organized definition of how the entire set of unlabeled data is observed and inferred. From the various features of the labeled training data \([(x_{l_q}^{(1)},y^{(1)}), (x_{l_q}^{(2)},y^{(2)}), \ldots , (x_{l_q}^{(m)},y^{(m)})]\), where \(q \in [1, 2, \ldots , r]\), the centroid points for every feature and every class are calculated as, \([z_{l_q}^{(1)}, z_{l_q}^{(2)}, \ldots , z_{l_q}^{(C)}]\), where \(q = [1, 2, \ldots , r]\). This space, called the Labeled data Feature Subspace (LFS), formed by these centroid points provide us the inference of the particular learning task to be performed. It is to be noted that:

  • Apart from unlabeled data, every labeled training data (and even testing data) gets incrementally added to the IFS, as the observed data affects the overall understanding of features.

  • The aim of incremental learning is to shape the IFS as close as possible to LF while learning the feature-label mapping using reinforcement learning.

The incremental update of the IFS happens for the ith training example belonging to jth class, as shown in the following equation:

$$\begin{aligned} z_{u_q}^{(j)} = z_{u_q}^{(j)} + \left( \frac{x_{l_q}^{(i)} - z_{u_q}^{(j)}}{n_{q}^{j}} \right) \end{aligned}$$
(7)

for \(q = [1, 2, \ldots , r]\), where \(n_{q}^{j}\) is the number of data points in the jth cluster for qth feature. Further, to make effective learning from this incremental update, the reward function is defined as a function of the distance between the current IFS and LFS, as follows:

$$\begin{aligned} r_{t} = \left( || z_{u_{q}, t}^{(j)} - z_{l_q}^{(j)} ||_{2}\right) ^{-1} \end{aligned}$$
(8)

for \(q = [1, 2, \ldots , r]\), \(j = [1, 2, \ldots , C]\) at a given time t.

figure a

2.4 Context-Dependent Addition Mechanism

Intuition learning framework acts as a supplement to (and not complementing) supervised learning. The need for intuition arises only when the confidence of supervised learner falls below a particular threshold. Therefore, a context-dependent mechanism is designed to leverage supervised learning using intuition only when required. For given labeled training data \(\{I_l^{(1)}, I_l^{(2)}, \ldots , I_l^{(m)}\}\), some handwritten or unsupervised features are extracted, \(\{(x_l^{(1)},y^{(1)}), (x_l^{(2)},y^{(2)}), \ldots ,\) \( (x_l^{(m)},y^{(m)})\}\) and a supervised model is learnt, \(H_{s}:\left( x_{l}^{(i)}\rightarrow \hat{y}_{s}\right) \). Based on the supervised learning algorithm, the classification confidence is computed for the ith data point and is given as \(conf_{s}^{(i)} = [cs_{1}^{(i)}, cs_{2}^{(i)},\ldots , cs_{C}^{(i)}]\). The mechanism to calculate the classification confidence depends on the supervised learning model used. Similarly, the intuition learning can be represented as \(H_{int}:\left( s^{(i)}\rightarrow \hat{y}_{int}\right) \) and the classification confidence is the output of the last layer of the value function approximation neural architecture, given as \(conf_{int}^{(i)} = [cint_{1}^{(i)}, cint_{2}^{(i)},\ldots , cint_{C}^{(i)}]\). A label switching mechanism is performed to give the final predicted label, \(\hat{y}\), as follows:

$$\begin{aligned} \hat{y}= {\left\{ \begin{array}{ll} \hat{y}_{s}, &{} \varDelta > \text {th } \\ \hat{y}_{new}, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(9)

where th is the threshold for using intuition and the condition for context \(\varDelta \) is calculated as follows:

$$\begin{aligned} \varDelta = \max _{j}\left( cs_{j}^{(i)} \right) - \max _{l \ne j}\left( cs_{l}^{(i)} \right) \end{aligned}$$
(10)

In such cases where intuition is used to boost the confidence of supervised classifier the new label is computed as follows:

$$\begin{aligned} cnew_{k}^{(i)} = \lambda .cs_{k}^{(i)} + (1-\lambda ).cint_{k}^{(i)} \end{aligned}$$
(11)
$$\begin{aligned} \hat{y}_{new} = \arg \max _{j}\left( cnew_{j}^{(i)} \right) \end{aligned}$$
(12)

where \(\lambda \) is the trade-off parameter between intuition and supervised learning. Thus, in simple words, we add the feeling of intuition to an algorithm. The entire approach is summarized as an algorithm in Algorithm 1.

3 Experimental Analysis

3.1 Dataset

The proposed intuition learning algorithm is applied for 10-class classification problem using the CIFAR-10 database [15]. The database contains 60,000 color images labeled, each of size \(32\times 32\) pertaining to 10 classes (i.e., 6,000 images per class). There are 50,000 training images and 10,000 test images. The data set contains small size images, leading to limited and noisy information content and it provides the most relevant case study to demonstrate the effectiveness of the proposed paradigm. The STL-10 database [7] is used as the unlabeled image data set having one million colored images of size \(96\times 96\). As shown in Table 2 six different feature representations are extracted from the images. These features comprehensively comprise the various types of features that could be extracted from image data. For all the experiments, five times random cross-validation is performed and the best model accuracy is reported for all the experiments. Sample images from CIFAR-10 and STL-10 datasets are shown in Fig. 4.

Table 2 Details of different features extracted from the image data
Fig. 4
figure 4

Sample set labeled images from CIFAR-10 database and unlabeled images from STL-10 database

Fig. 5
figure 5

Image showing the data clusters for each of the extracted feature and grid depicts the cluster density at local regions. a shows the IFS of all the unlabeled data, b shows the adapted task-specific feature space after 300 epochs of learning, and c shows the amount of change happening in the cluster after the addition of an image. Best viewed in color

3.2 Interpreting Intuition-Based Feature Subspace

The primary aim of the approach is to construct the feature subspace completely from unlabeled data and to adapt it to a specific learning task. Figure 5 shows the clusters of entire unlabeled data corresponding to every feature extracted. The concatenation of the feature spaces put together in Fig. 5a represents the IFS. Figure 5b shows the adapted task-specific feature subspace after performing 300 epochs of learning with the given labeled data. Figure 5c shows the amount of update in the cluster after adding an image, by calculating the dissimilarity between the cluster centroid, before and after the addition of the image. Cluster dissimilarity is calculated for the \(r-th\) feature representation as follows:

$$\begin{aligned} C_{dis} = \sum _{j=1}^{C} \frac{1}{D_{j}}.||z_{u_r}^{(j)}|_{(t+1)} - z_{u_r}^{(j)}|_{(t)}||_{2} \end{aligned}$$
(13)

where \(D_{j}\) is the density of the jth cluster. It can be visually observed from the plot that, shape, gist, and LBP feature spaces are updated (learns) after the addition of each image, indicating that these features contribute more towards the classification task. However, both color Harris and autocorrelogram features are not much updated by the training data.

3.3 Performance Analysis

It is to be noted that intuition learning framework is used to supplement any supervised or semi-supervised learning mechanism. In this research, we show the results in the following scenario:

  1. 1.

    Using two supervised learning algorithms (backpropagation neural network and multi-class SVM) with Uniform Circular Local Binary Pattern (UCLBP) [19] as features. Labeled data, \([(x_{l_q}^{(1)},y^{(1)}), (x_{l_q}^{(2)},y^{(2)}), \ldots , (x_{l_q}^{(m)},y^{(m)})]\), from CIFAR-10 is used to train the supervised algorithms.

  2. 2.

    Using a semi-supervised learning algorithm, with neural network as classifier and UCLBP features trained on CIFAR-10 dataset. The semi-supervised algorithm used for comparison is one approach for self-taught learning [23], with unlabeled data from STL-10 dataset, \(\{(s^{(1)},y^{(1)}), (s^{(2)},y^{(2)}), \ldots , (s^{(m)}, \) \(y^{(m)})\}\).

  3. 3.

    Using a intuition learning framework only, having the intuition-based task-specific feature representation combined with a continuous state reinforcement learning (Q-learning) in Eq. 4 for classification.

  4. 4.

    Using a supervised intuition framework, where the output of the supervised learning algorithm and the intuition learning framework is combined using the context-dependent addition mechanism in Eq. 12.

  5. 5.

    Using a semi-supervised intuition framework, where the output of the semi-supervised learning algorithm and the intuition learning framework is combined using the context-dependent addition mechanism.

The optimized values of various parameters used in our framework are as follows: \(\alpha =0.99, \gamma =0.95, \beta =0.2,th=0.9, \lambda =0.5\), and \(\epsilon =0.05\). Preprocessing of features is done using z-score normalization. All the experiments are performed on a Intel Xeon \(E5-2640\) 0, 2.50 GHz, 64 GB RAM server.

Table 3 The performance accuracy (%) of supervised intuition learning is compared with supervised (neural network) and semi-supervised (self-taught) learning methods. The significance of intuition is studied by varying the amount of available training data. 5 times random cross-validation is performed and the best modelś performance is reported
Table 4 The influence of supplementing intuition to supervised and semi-supervised algorithm is shown by improvement in the performance accuracy (%)
Table 5 The performance accuracy (%) of supervised and supervised intuition framework using SVM classifier is studied
Fig. 6
figure 6

Examples of a success and b failure cases of the proposed intuition learning. AL \(=\) actual ground truth label, SL \(=\) label predicted by the supervised neural network learner, and IL \(=\) label predicted when intuition is combined with supervised neural network learner

As already discussed, intuition has a better significance in challenging problems with limited training data. Tables 3, 4, and 5 show the performance of the proposed intuition learning in comparison with other learning methods, by varying the training size as parameter.Footnote 2 It can be observed that with enough training data, supervised algorithms (both neural network and SVM) yield the best classification performance. However, with decrease in the size of training data, the performance of all the three algorithms, supervised, semi-supervised, and intuition learning reduces. The results show that in such a scenario, incorporating intuition with supervised or semi-supervised algorithm yields improved results. This supports our hypothesis that adding intuition would improve the performance from under challenging circumstances such as limited training data. Similarly from a human’s perspective, under the presence of all data and information, one may take correct decisions. However, when the background training data information is limited, intuition learning helps. Further, some key analysis are summarized below:

Fig. 7
figure 7

A plot between the cumulative errors across each epoch empirically showing the learning effectiveness of the residual Q-learning performed in Eq. 6

  1. 1.

    To study the effectiveness of residual learning in Eq. 6, training error over successive epochs is plotted, as shown in Fig. 7, for a training size of 10000. It can be observed that the training error gradually decreases and remains constant after 300 epochs, indicating that maximum training capacity has been achieved, with minimum training error.

  2. 2.

    The computation time required for intuition learning depends on the complexity of r features that are extracted. However, for one sample, under the assumption that the feature extraction happens off-line, the overall intuition decision and feature space can be generated in 0.082 s while the supervised decision can be taken in \({\sim }4\) s on an average. This shows that intuition is much faster requiring little effort than supervised decision-making.

  3. 3.

    In Fig. 6, some success and failure example cases are shown where (a) intuition helps incorrectly classifying a data but supervised learning fails and (b) data was incorrectly classified because of intuition. As previously discussed, intuition can go wrong sometimes. Upon analyzing the first horse example in failure case (Fig. 6b), it is observed that horses are clustered more towards brown color in the autocorrelogram color feature space. However, as the horse shown in the images is white in color, it gets clustered along with cat and misclassified by intuition learning.

4 Conclusion

Inspired from human capabilities of instinct reasoning, this research presents a intuition learning framework that supplements a classifier for improved performance, especially with limited training data. Intuition is modeled as a continuous state reinforcement learning, that adapts to a particular task using large amount of unlabeled data and limited task-specific data. The performance of intuition is shown in a 10 class image classification problem, in comparison with supervised, semi-supervised, and reinforcement learning. The results indicate that the application of intuition improves the performance of the classifier with limited training data.