1 Supervised Learners

Machine Learning seeks to learn from known data and apply it to never seen before data. Classification or Supervised Learning is one of the core Machine Learning tasks. In Supervised Learning, one learns to assign a label (class) given a vector of predictors. Interested readers may find summary introduction in [1] and deep introduction in [2] and there are several other classics on the subject matter [3,4,5], and for those interested in managing large scale machine intelligence projects [6] is an excellent source (Fig. 1).

Fig. 1.
figure 1

Machine learning classifier tree

There are many Classification algorithms as shown above and one is faced with a dilemma: Which algorithm should one use given a particular dataset?

1.1 Satisficing Solutionx

The satisficing decision-making as discussed in [6] is a heuristic where people settle with a solution to a problem that is ‘good enough’ but may not be the optimal one. A “Satisficing Solution” can be considered as a vernacular description of Occam’s Razor [7, 8]. The notion of Satisficing Solution does not run counter to the well known axiom “No Free Lunch Theorem” [10] in Machine Learning. In combination with the razor, a satisficing solution is good enough.

1.2 No Free Lunch (NFL) Theorem

There is considerable debate [9, 10] about NFL [11] as to its meaning and interpretation and there is even an organization dedicated to NFL [12].

1.3 What Is Cost?

If it cannot be free, what is the cost? As outlined in [6] and [13], misclassification error is not the only cost. There are other costs including:

  1. a)

    demand on memory,

  2. b)

    processing time and

  3. c)

    interpretability.

1.4 Need for Automated Algorithm Selector

In our opinion, as M/L is adopted more and more, the most impactful consideration for practitioners is that there is no single classifier can outperform in all domains. Consequently and it is imperative for practitions to ask the fundamental question posed in [14] “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” before settling on a particular algorithm. As the number of practitioners increase, ability to run a model will cease to be an advantage. The need for automating the algorithm selection process will become all too important and immediate. There have been several experiments comparing classifier performance [15,16,17,18], but none is available as a service to practitioners.

In this paper we will present our efforts, the Swiss Army Knife for No Free Lunch (NFL-SAK) to make lunch free for anyone with a dataset. Consistent with Occam’s Razor, we allow users to submit a dataset, provide some hints to the structure of data and run several established classification algorithms of different types (parametric, instance based, logic based, ensemble and stacked-generalization). The NFL-SAK presents a useful tabulation of performance metrics. In its current form we present Area Under the Curve (AUC) [20] and Accuracy. There are several other performance metrics, see chapter 7 in Practical Data Mining [6] for a detailed overview and we plan to incorporate them in later revisions.

2 Implementation

Given a dataset, a model Formula, and a set of algorithms, NFL-SAK platform, performs a classification over the given set of algorithms. System uses readily available packages in R [21] including:

  1. 1.

    library (DMwR)

  2. 2.

    library (caret)

  3. 3.

    library (e1071)

  4. 4.

    library (pROC)

  5. 5.

    library (randomForest)

  6. 6.

    library (rattle)

  7. 7.

    library (rpart)

The process is intuitive as shown below (Fig. 2):

Fig. 2.
figure 2

NFL-SAK process and user interaction frameworks

The Shiny UI implements a “Classify By Example” model where the practitioner can specify One or more classifiers, the independent variable and the dependent variables. Consistent with Ockham’s Razor, each selected classifier will be run with the simplest default model without parameter tuning and the results displayed for review (Fig. 3).

Fig. 3.
figure 3

Users first interaction with NFL-SAK, users name the experiment, specify the dataset.

2.1 Dataset Specification

Here we have loaded the Hepatitis [29,30,31] dataset. We want to use 70% for training and the rest for testing.

2.2 Modeling Specification

Users can specify the model Formula and train one or more of the learners. Here we have identified the class variable and the list of learners we want to evaluate. Note that one parametric classification (Logistic Regression), one instance based classifier (kNN), one logic based classifier (Decision Tree) and a Support Vector Machine alongside RandomForest (an Ensemble classifier is given. Stacking with voting is also run by default) (Fig. 4).

Fig. 4.
figure 4

Experimental design specification

Now we will run the model and review the results.

2.3 Model Output

First numeric performance measures including Accuracies and AUC are presented (Figs. 5 and 6).

Fig. 5.
figure 5

Table of numeric performance metrics.

Fig. 6.
figure 6

Table of classifier performance accuracy and AUC.

Modest visualizations are presented allowing one to compare the relative measures. We used shiny [22] and shinyWidgets [23] for generating these visualizations and without the swarm wisdom available from netizens [32] none of this is possible, given that we are unfunded, staffed by 1 TA,1 Volunteer and 1 undergraduate student.

Results of stacked generalization is presented below (Fig. 7).

Fig. 7.
figure 7

ROC curves with and without stacked generalization

For the Hepatitis dataset, the stacked-generalizer using LogisticRegression, DecisionTree, Nearest Neighbor, Support Vector Machines, randomForest and the RandomForest are shown as specified. The Stacked Generalizer results in the highest performance of 0.899 combining all the above classifiers including the Ensemble classifier.

3 Conclusion

In this paper we summarized the import of No Free Lunch theorem, efficacy of Occam’s Razor in searching for the best performing classifier for any given dataset. Guided by Occam’s Razor, weak learners are trained at default configuration. User is allowed to pick and choose algorithms, specify a training set proportion. The system then runs the stacked-generalizer using voting mechanism. Comparative performance measures are displayed with Accuracy and AUC. ROC curves are generated for the specified algorithms. Users can perform multiple experiments and save them for further analysis.