A Short Tour of the Predictive Modeling Process

Kuhn, Max; Johnson, Kjell

doi:10.1007/978-1-4614-6849-3_2

Max Kuhn³ &
Kjell Johnson⁴

214k Accesses
13 Citations

Abstract

To begin Part I of this work, we present a simple example that illustrates the broad concepts of model building. Section 2.1 provides an overview of a fuel economy data set for which the objective is to predict vehicles' fuel economy based on standard vehicle predictors such as engine displacement, number of cylinders, type of transmission, and manufacturer. In the context of this example, we explain the concepts of “spending” data, estimating model performance, building candidate models, and selecting the optimal model (Section 2.2).

Access provided by Autonomous University of Puebla. Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Before diving in to the formal components of model building, we present a simple example that illustrates the broad concepts of model building. Specifically, the following example demonstrates the concepts of data “spending,” building candidate models, and selecting the optimal model.

1 Case Study: Predicting Fuel Economy

The fueleconomy.gov web site, run by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy and the U.S. Environmental Protection Agency, lists different estimates of fuel economy for passenger cars and trucks. For each vehicle, various characteristics are recorded such as the engine displacement or number of cylinders. Along with these values, laboratory measurements are made for the city and highway miles per gallon (MPG) of the car.

In practice, we would build a model on as many vehicle characteristics as possible in order to find the most predictive model. However, this introductory illustration will focus high-level concepts of model building by using a single predictor, engine displacement (the volume inside the engine cylinders), and a single response, unadjusted highway MPG for 2010–2011 model year cars.

The first step in any model building process is to understand the data, which can most easily be done through a graph. Since we have just one predictor and one response, these data can be visualized with a scatter plot (Fig. 2.1). This figure shows the relationship between engine displacement and fuel economy. The “2010 model year” panel contains all the 2010 data while the other panel shows the data only for new 2011 vehicles. Clearly, as engine displacement increases, the fuel efficiency drops regardless of year. The relationship is somewhat linear but does exhibit some curvature towards the extreme ends of the displacement axis.

If we had more than one predictor, we would need to further understand characteristics of the predictors and the relationships among the predictors. These characteristics may suggest important and necessary pre-processing steps that must be taken prior to building a model (Chap. 3).

After first understanding the data, the next step is to build and evaluate a model on the data. A standard approach is to take a random sample of the data for model building and use the rest to understand model performance. However, suppose we want to predict the MPG for a new car line. In this situation, models can be created using the 2010 data (containing 1,107 vehicles) and tested on the 245 new 2011 cars. The common terminology would be that the 2010 data are used as the model “training set” and the 2011 values are the “test” or “validation” set.

Now that we have defined the data used for model building and evaluation, we should decide how to measure performance of the model. For regression problems where we try to predict a numeric value, the residuals are important sources of information. Residuals are computed as the observed value minus the predicted value (i.e., $y_{i} -\widehat{ y}_{i}$). When predicting numeric values, the root mean squared error (RMSE) is commonly used to evaluate models. Described in more detail in Chap. 7, RMSE is interpreted as how far, on average, the residuals are from zero.

At this point, the modeler will try various techniques to mathematically define the relationship between the predictor and outcome. To do this, the training set is used to estimate the various values needed by the model equations. The test set will be used only when a few strong candidate models have been finalized (repeatedly using the test set in the model build process negates its utility as a final arbitrator of the models).

Suppose a linear regression model was created where the predicted MPG is a basic slope and intercept model. Using the training data, we estimate the intercept to be 50.6 and the slope to be − 4. 5 MPG/liters using the method of least squares (Sect. 6.2). The model fit is shown in Fig. 2.2 for the training set data.^{Footnote 1} The left-hand panel shows the training set data with a linear model fit defined by the estimated slope and intercept. The right-hand panel plots the observed and predicted MPG. These plots demonstrate that this model misses some of the patterns in the data, such as under-predicting fuel efficiency when the displacement is less than 2 L or above 6 L.

When working with the training set, one must be careful not to simply evaluate model performance using the same data used to build the model. If we simply re-predict the training set data, there is the potential to produce overly optimistic estimates of how well the model works, especially if the model is highly adaptable. An alternative approach for quantifying how well the model operates is to use resampling, where different subversions of the training data set are used to fit the model. Resampling techniques are discussed in Chap. 4. For these data, we used a form of resampling called 10-fold cross-validation to estimate the model RMSE to be 4.6 MPG.

Looking at Fig. 2.2, it is conceivable that the problem might be solved by introducing some nonlinearity in the model. There are many ways to do this. The most basic approach is to supplement the previous linear regression model with additional complexity. Adding a squared term for engine displacement would mean estimating an additional slope parameter associated with the square of the predictor. In doing this, the model equation changes to

$$\displaystyle{\text{efficiency} = 63.2 - 11.9 \times \text{displacement} + 0.94 \times {\text{displacement}}^{2}}$$

This is referred to as a quadratic model since it includes a squared term; the model fit is shown in Fig. 2.3. Unquestionably, the addition of the quadratic term improves the model fit. The RMSE is now estimated to be 4.2 MPG using cross-validation. One issue with quadratic models is that they can perform poorly on the extremes of the predictor. In Fig. 2.3, there may be a hint of this for the vehicles with very high displacement values. The model appears to be bending upwards unrealistically. Predicting new vehicles with large displacement values may produce significantly inaccurate results.

Chapters 6–8 discuss many other techniques for creating sophisticated relationships between the predictors and outcome. One such approach is the multivariate adaptive regression spline (MARS) model (Friedman, 1991). When used with a single predictor, MARS can fit separate linear regression lines for different ranges of engine displacement. The slopes and intercepts are estimated for this model, as well as the number and size of the separate regions for the linear models. Unlike the linear regression models, this technique has a tuning parameter which cannot be directly estimated from the data. There is no analytical equation that can be used to determine how many segments should be used to model the data. While the MARS model has internal algorithms for making this determination, the user can try different values and use resampling to determine the appropriate value. Once the value is found, a final MARS model would be fit using all the training set data and used for prediction.

For a single predictor, MARS can allow for up to five model terms (similar to the previous slopes and intercepts). Using cross-validation, we evaluated four candidate values for this tuning parameter to create the resampling profile which is shown in Fig. 2.4. The lowest RMSE value is associated with four terms, although the scale of change in the RMSE values indicates that there is some insensitivity to this tuning parameter. The RMSE associated with the optimal model was 4.2 MPG. After fitting the final MARS model with four terms, the training set fit is shown in Fig. 2.5 where several linear segments were predicted.

Based on these three models, the quadratic regression and MARS models were evaluated on the test set. Figure 2.6 shows these results. Both models fit very similarly. The test set RMSE values for the quadratic model was 4.72 MPG and the MARS model was 4.69 MPG. Based on this, either model would be appropriate for the prediction of new car lines.

2 Themes

There are several aspects of the model building process that are worth discussing further, especially for those who are new to predictive modeling.

2.1 Data Splitting

Although discussed in the next chapter, how we allocate data to certain tasks (e.g., model building, evaluating performance) is an important aspect of modeling. For this example, the primary interest is to predict the fuel economy of new vehicles, which is not the same population as the data used to build the model. This means that, to some degree, we are testing how well the model extrapolates to a different population. If we were interested in predicting from the same population of vehicles (i.e., interpolation), taking a simple random sample of the data would be more appropriate. How the training and test sets are determined should reflect how the model will be applied.

How much data should be allocated to the training and test sets? It generally depends on the situation. If the pool of data is small, the data splitting decisions can be critical. A small test would have limited utility as a judge of performance. In this case, a sole reliance on resampling techniques (i.e., no test set) might be more effective. Large data sets reduce the criticality of these decisions.

2.2 Predictor Data

This example has revolved around one of many predictors: the engine displacement. The original data contain many other factors, such as the number of cylinders, the type of transmission, and the manufacturer. An earnest attempt to predict the fuel economy would examine as many predictors as possible to improve performance. Using more predictors, it is likely that the RMSE for the new model cars can be driven down further. Some investigation into the data can also help. For example, none of the models were effective at predicting fuel economy when the engine displacement was small. Inclusion of predictors that target these types of vehicles would help improve performance.

An aspect of modeling that was not discussed here was feature selection: the process of determining the minimum set of relevant predictors needed by the model. This common task is discussed in Chap. 19.

2.3 Estimating Performance

Before using the test set, two techniques were employed to determine the effectiveness of the model. First, quantitative assessments of statistics (i.e., the RMSE) using resampling help the user understand how each technique would perform on new data. The other tool was to create simple visualizations of a model, such as plotting the observed and predicted values, to discover areas of the data where the model does particularly good or bad. This type of qualitative information is critical for improving models and is lost when the model is gauged only on summary statistics.

2.4 Evaluating Several Models

For these data, three different models were evaluated. It is our experience that some modeling practitioners have a favorite model that is relied on indiscriminately. The “No Free Lunch” Theorem (Wolpert, 1996) argues that, without having substantive information about the modeling problem, there is no single model that will always do better than any other model. Because of this, a strong case can be made to try a wide variety of techniques, then determine which model to focus on. In our example, a simple plot of the data shows that there is a nonlinear relationship between the outcome and the predictor. Given this knowledge, we might exclude linear models from consideration, but there is still a wide variety of techniques to evaluate. One might say that “model X is always the best performing model” but, for these data, a simple quadratic model is extremely competitive.

2.5 Model Selection

At some point in the process, a specific model must be chosen. This example demonstrated two types of model selection. First, we chose some models over others: the linear regression model did not fit well and was dropped. In this case, we chose between models. There was also a second type of model selection shown. For MARS, the tuning parameter was chosen using cross-validation. This was also model selection where we decided on the type of MARS model to use. In this case, we did the selection within different MARS models.

In either case, we relied on cross-validation and the test set to produce quantitative assessments of the models to help us make the choice. Because we focused on a single predictor, which will not often be the case, we also made visualizations of the model fit to help inform us. At the end of the process, the MARS and quadratic models appear to give equivalent performance. However, knowing that the quadratic model might not do well for vehicles with very large displacements, our intuition might tell us to favor the MARS model. One goal of this book is to help the user gain intuition regarding the strengths and weakness of different models to make informed decisions.

3 Summary

At face value, model building appears straightforward: pick a modeling technique, plug in data, and generate a prediction. While this approach will generate a predictive model, it will most likely not generate a reliable, trustworthy model for predicting new samples. To get this type of model, we must first understand the data and the objective of the modeling. Upon understanding the data and objectives, we then pre-process and split the data. Only after these steps do we finally proceed to building, evaluating, and selecting models.

Notes

1.
One of our graduate professors once said “the only way to be comfortable with your data is to never look at it.”

References

Abdi H, Williams L (2010). “Principal Component Analysis.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.
Article Google Scholar
Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.
Google Scholar
Ahdesmaki M, Strimmer K (2010). “Feature Selection in Omics Prediction Problems Using CAT Scores and False Nondiscovery Rate Control.” The Annals of Applied Statistics, 4(1), 503–519.
Article MathSciNet MATH Google Scholar
Alin A (2009). “Comparison of PLS Algorithms when Number of Objects is Much Larger than Number of Variables.” Statistical Papers, 50, 711–720.
Article MathSciNet MATH Google Scholar
Altman D, Bland J (1994). “Diagnostic Tests 3: Receiver Operating Characteristic Plots.” British Medical Journal, 309(6948), 188.
Article Google Scholar
Ambroise C, McLachlan G (2002). “Selection Bias in Gene Extraction on the Basis of Microarray Gene–Expression Data.” Proceedings of the National Academy of Sciences, 99(10), 6562–6566.
Article MATH Google Scholar
Amit Y, Geman D (1997). “Shape Quantization and Recognition with Randomized Trees.” Neural Computation, 9, 1545–1588.
Article Google Scholar
Armitage P, Berry G (1994). Statistical Methods in Medical Research. Blackwell Scientific Publications, Oxford, 3rd edition.
Google Scholar
Artis M, Ayuso M, Guillen M (2002). “Detection of Automobile Insurance Fraud with Discrete Choice Models and Misclassified Claims.” The Journal of Risk and Insurance, 69(3), 325–340.
Article Google Scholar
Austin P, Brunner L (2004). “Inflation of the Type I Error Rate When a Continuous Confounding Variable Is Categorized in Logistic Regression Analyses.” Statistics in Medicine, 23(7), 1159–1178.
Article Google Scholar
Ayres I (2007). Super Crunchers: Why Thinking–By–Numbers Is The New Way To Be Smart. Bantam.
Google Scholar
Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.
Article Google Scholar
Batista G, Prati R, Monard M (2004). “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data.” ACM SIGKDD Explorations Newsletter, 6(1), 20–29.
Article Google Scholar
Bauer E, Kohavi R (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning, 36, 105–142.
Article Google Scholar
Becton Dickinson and Company (1991). ProbeTec ET Chlamydia trachomatis and Neisseria gonorrhoeae Amplified DNA Assays (Package Insert).
Google Scholar
Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology, 7(3), 559–583.
Article Google Scholar
Bentley J (1975). “Multidimensional Binary Search Trees Used for Associative Searching.” Communications of the ACM, 18(9), 509–517.
Article MathSciNet MATH Google Scholar
Berglund A, Kettaneh N, Uppgård L, Wold S, DR NB, Cameron (2001). “The GIFI Approach to Non–Linear PLS Modeling.” Journal of Chemometrics, 15, 321–336.
Google Scholar
Berglund A, Wold S (1997). “INLR, Implicit Non–Linear Latent Variable Regression.” Journal of Chemometrics, 11, 141–156.
Article Google Scholar
Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.
Article Google Scholar
Bergstra J, Casagrande N, Erhan D, Eck D, Kégl B (2006). “Aggregate Features and AdaBoost for Music Classification.” Machine Learning, 65, 473–484.
Article Google Scholar
Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.
Article Google Scholar
Bhanu B, Lin Y (2003). “Genetic Algorithm Based Feature Selection for Target Detection in SAR Images.” Image and Vision Computing, 21, 591–608.
Article Google Scholar
Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.
MATH Google Scholar
Bishop C (2006). Pattern Recognition and Machine Learning. Springer.
Google Scholar
Bland J, Altman D (1995). “Statistics Notes: Multiple Significance Tests: The Bonferroni Method.” British Medical Journal, 310(6973), 170–170.
Article Google Scholar
Bland J, Altman D (2000). “The Odds Ratio.” British Medical Journal, 320(7247), 1468.
Article Google Scholar
Bohachevsky I, Johnson M, Stein M (1986). “Generalized Simulated Annealing for Function Optimization.” Technometrics, 28(3), 209–217.
Article MATH Google Scholar
Bone R, Balk R, Cerra F, Dellinger R, Fein A, Knaus W, Schein R, Sibbald W (1992). “Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis.” Chest, 101(6), 1644–1655.
Article Google Scholar
Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152.
Google Scholar
Boulesteix A, Strobl C (2009). “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High–Dimensional Prediction.” BMC Medical Research Methodology, 9(1), 85.
Article Google Scholar
Box G, Cox D (1964). “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252.
Google Scholar
Box G, Hunter W, Hunter J (1978). Statistics for Experimenters. Wiley, New York.
MATH Google Scholar
Box G, Tidwell P (1962). “Transformation of the Independent Variables.” Technometrics, 4(4), 531–550.
Article MathSciNet MATH Google Scholar
Breiman L (1996a). “Bagging Predictors.” Machine Learning, 24(2), 123–140.
MathSciNet MATH Google Scholar
Breiman L (1996b). “Heuristics of Instability and Stabilization in Model Selection.” The Annals of Statistics, 24(6), 2350–2383.
Article MathSciNet MATH Google Scholar
Breiman L (1996c). “Technical Note: Some Properties of Splitting Criteria.” Machine Learning, 24(1), 41–47.
MathSciNet MATH Google Scholar
Breiman L (1998). “Arcing Classifiers.” The Annals of Statistics, 26, 123–140.
MathSciNet MATH Google Scholar
Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.
Google Scholar
Breiman L (2001). “Random Forests.” Machine Learning, 45, 5–32.
Article MATH Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.
MATH Google Scholar
Bridle J (1990). “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition.” In “Neurocomputing: Algorithms, Architectures and Applications,” pp. 227–236. Springer–Verlag.
Google Scholar
Brillinger D (2004). “Some Data Analyses Using Mutual Information.” Brazilian Journal of Probability and Statistics, 18(6), 163–183.
MathSciNet MATH Google Scholar
Brodnjak-Vonina D, Kodba Z, Novi M (2005). “Multivariate Data Analysis in Classification of Vegetable Oils Characterized by the Content of Fatty Acids.” Chemometrics and Intelligent Laboratory Systems, 75(1), 31–43.
Article Google Scholar
Brown C, Davis H (2006). “Receiver Operating Characteristics Curves and Related Decision Measures: A Tutorial.” Chemometrics and Intelligent Laboratory Systems, 80(1), 24–38.
Article Google Scholar
Bu G (2009). “Apolipoprotein E and Its Receptors in Alzheimer’s Disease: Pathways, Pathogenesis and Therapy.” Nature Reviews Neuroscience, 10(5), 333–344.
Article Google Scholar
Buckheit J, Donoho DL (1995). “WaveLab and Reproducible Research.” In A Antoniadis, G Oppenheim (eds.), “Wavelets in Statistics,” pp. 55–82. Springer-Verlag, New York.
Google Scholar
Burez J, Van den Poel D (2009). “Handling Class Imbalance In Customer Churn Prediction.” Expert Systems with Applications, 36(3), 4626–4636.
Google Scholar
Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.
MathSciNet MATH Google Scholar
Caputo B, Sim K, Furesjo F, Smola A (2002). “Appearance–Based Object Recognition Using SVMs: Which Kernel Should I Use?” In “Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision,”.
Google Scholar
Carolin C, Boulesteix A, Augustin T (2007). “Unbiased Split Selection for Classification Trees Based on the Gini Index.” Computational Statistics & Data Analysis, 52(1), 483–501.
Article MathSciNet MATH Google Scholar
Castaldi P, Dahabreh I, Ioannidis J (2011). “An Empirical Assessment of Validation Practices for Molecular Classifiers.” Briefings in Bioinformatics, 12(3), 189–202.
Article Google Scholar
Chambers J (2008). Software for Data Analysis: Programming with R. Springer.
Google Scholar
Chan K, Loh W (2004). “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics, 13(4), 826–852.
Article MathSciNet Google Scholar
Chang CC, Lin CJ (2011). “LIBSVM: A Library for Support Vector Machines.” ACM Transactions on Intelligent Systems and Technology, 2, 27: 1–27:27.
Google Scholar
Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002). “SMOTE: Synthetic Minority Over–Sampling Technique.” Journal of Artificial Intelligence Research, 16(1), 321–357.
MATH Google Scholar
Chun H, Keleş S (2010). “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1), 3–25.
Article MathSciNet Google Scholar
Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.
Article MathSciNet MATH Google Scholar
Clark R (1997). “OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets.” Journal of Chemical Information and Computer Sciences, 37(6), 1181–1188.
Article Google Scholar
Clark T (2004). “Can Out–of–Sample Forecast Comparisons Help Prevent Overfitting?” Journal of Forecasting, 23(2), 115–139.
Article Google Scholar
Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.
Article MathSciNet Google Scholar
Cleveland W (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots.” Journal of the American Statistical Association, 74(368), 829–836.
Article MathSciNet MATH Google Scholar
Cleveland W, Devlin S (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting.” Journal of the American Statistical Association, pp. 596–610.
Google Scholar
Cohen G, Hilario M, Pellegrini C, Geissbuhler A (2005). “SVM Modeling via a Hybrid Genetic Strategy. A Health Care Application.” In R Engelbrecht, AGC Lovis (eds.), “Connecting Medical Informatics and Bio–Informatics,” pp. 193–198. IOS Press.
Google Scholar
Cohen J (1960). “A Coefficient of Agreement for Nominal Data.” Educational and Psychological Measurement, 20, 37–46.
Article Google Scholar
Cohn D, Atlas L, Ladner R (1994). “Improving Generalization with Active Learning.” Machine Learning, 15(2), 201–221.
Google Scholar
Cornell J (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data. Wiley, New York, NY.
Book MATH Google Scholar
Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.
MATH Google Scholar
Costa N, Lourenco J, Pereira Z (2011). “Desirability Function Approach: A Review and Performance Evaluation in Adverse Conditions.” Chemometrics and Intelligent Lab Systems, 107(2), 234–244.
Google Scholar
Cover TM, Thomas JA (2006). Elements of Information Theory. Wiley–Interscience.
Google Scholar
Craig-Schapiro R, Kuhn M, Xiong C, Pickering E, Liu J, Misko TP, Perrin R, Bales K, Soares H, Fagan A, Holtzman D (2011). “Multiplexed Immunoassay Panel Identifies Novel CSF Biomarkers for Alzheimer’s Disease Diagnosis and Prognosis.” PLoS ONE, 6(4), e18850.
Article Google Scholar
Cruz-Monteagudo M, Borges F, Cordeiro MND (2011). “Jointly Handling Potency and Toxicity of Antimicrobial Peptidomimetics by Simple Rules from Desirability Theory and Chemoinformatics.” Journal of Chemical Information and Modeling, 51(12), 3060–3077.
Article Google Scholar
Davison M (1983). Multidimensional Scaling. John Wiley and Sons, Inc.
MATH Google Scholar
Dayal B, MacGregor J (1997). “Improved PLS Algorithms.” Journal of Chemometrics, 11, 73–85.
Article Google Scholar
de Jong S (1993). “SIMPLS: An Alternative Approach to Partial Least Squares Regression.” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.
Google Scholar
de Jong S, Ter Braak C (1994). “Short Communication: Comments on the PLS Kernel Algorithm.” Journal of Chemometrics, 8, 169–174.
Google Scholar
de Leon M, Klunk W (2006). “Biomarkers for the Early Diagnosis of Alzheimer’s Disease.” The Lancet Neurology, 5(3), 198–199.
Google Scholar
Defernez M, Kemsley E (1997). “The Use and Misuse of Chemometrics for Treating Classification Problems.” TrAC Trends in Analytical Chemistry, 16(4), 216–221.
Article Google Scholar
DeLong E, DeLong D, Clarke-Pearson D (1988). “Comparing the Areas Under Two Or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, 44(3), 837–45.
Google Scholar
Derksen S, Keselman H (1992). “Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables.” British Journal of Mathematical and Statistical Psychology, 45(2), 265–282.
Article Google Scholar
Derringer G, Suich R (1980). “Simultaneous Optimization of Several Response Variables.” Journal of Quality Technology, 12(4), 214–219.
Google Scholar
Dietterich T (2000). “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning, 40, 139–158.
Article Google Scholar
Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.
MATH Google Scholar
Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.
Google Scholar
Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997). “Support Vector Regression Machines.” Advances in Neural Information Processing Systems, pp. 155–161.
Google Scholar
Drummond C, Holte R (2000). “Explicitly Representing Expected Cost: An Alternative to ROC Representation.” In “Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,” pp. 198–207.
Chapter Google Scholar
Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285.
Google Scholar
Dudoit S, Fridlyand J, Speed T (2002). “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, 97(457), 77–87.
Article MathSciNet MATH Google Scholar
Duhigg C (2012). “How Companies Learn Your Secrets.” The New York Times. URL http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html.
Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford.
Google Scholar
Dwyer D (2005). “Examples of Overfitting Encountered When Building Private Firm Default Prediction Models.” Technical report, Moody’s KMV.
Google Scholar
Efron B (1983). “Estimating the Error Rate of a Prediction Rule: Improvement on Cross–Validation.” Journal of the American Statistical Association, pp. 316–331.
Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004). “Least Angle Regression.” The Annals of Statistics, 32(2), 407–499.
Article MathSciNet MATH Google Scholar
Efron B, Tibshirani R (1986). “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science, pp. 54–75.
Google Scholar
Efron B, Tibshirani R (1997). “Improvements on Cross–Validation: The 632+ Bootstrap Method.” Journal of the American Statistical Association, 92(438), 548–560.
MathSciNet MATH Google Scholar
Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187.
Google Scholar
Eugster M, Hothorn T, Leisch F (2008). “Exploratory and Inferential Analysis of Benchmark Experiments.” Ludwigs-Maximilians-Universität München, Department of Statistics, Tech. Rep, 30.
Google Scholar
Everitt B, Landau S, Leese M, Stahl D (2011). Cluster Analysis. Wiley.
Google Scholar
Ewald B (2006). “Post Hoc Choice of Cut Points Introduced Bias to Diagnostic Research.” Journal of clinical epidemiology, 59(8), 798–801.
Article Google Scholar
Fanning K, Cogger K (1998). “Neural Network Detection of Management Fraud Using Published Financial Data.” International Journal of Intelligent Systems in Accounting, Finance & Management, 7(1), 21–41.
Article Google Scholar
Faraway J (2005). Linear Models with R. Chapman & Hall/CRC, Boca Raton.
MATH Google Scholar
Fawcett T (2006). “An Introduction to ROC Analysis.” Pattern Recognition Letters, 27(8), 861–874.
Article MathSciNet Google Scholar
Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.
Article Google Scholar
Forina M, Casale M, Oliveri P, Lanteri S (2009). “CAIMAN brothers: A Family of Powerful Classification and Class Modeling Techniques.” Chemometrics and Intelligent Laboratory Systems, 96(2), 239–245.
Article Google Scholar
Frank E, Wang Y, Inglis S, Holmes G (1998). “Using Model Trees for Classification.” Machine Learning.
Google Scholar
Frank E, Witten I (1998). “Generating Accurate Rule Sets Without Global Optimization.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151.
Google Scholar
Free Software Foundation (June 2007). GNU General Public License.
Google Scholar
Freund Y (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121, 256–285.
Article MathSciNet MATH Google Scholar
Freund Y, Schapire R (1996). “Experiments with a New Boosting Algorithm.” Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156.
Google Scholar
Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.
Article MathSciNet Google Scholar
Friedman J (1991). “Multivariate Adaptive Regression Splines.” The Annals of Statistics, 19(1), 1–141.
Article MathSciNet MATH Google Scholar
Friedman J (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 29(5), 1189–1232.
Article MathSciNet MATH Google Scholar
Friedman J (2002). “Stochastic Gradient Boosting.” Computational Statistics and Data Analysis, 38(4), 367–378.
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.
Article Google Scholar
Geisser S (1993). Predictive Inference: An Introduction. Chapman and Hall.
Google Scholar
Geladi P, Kowalski B (1986). “Partial Least-Squares Regression: A Tutorial.” Analytica Chimica Acta, 185, 1–17.
Article Google Scholar
Geladi P, Manley M, Lestander T (2003). “Scatter Plotting in Multivariate Data Analysis.” Journal of Chemometrics, 17(8–9), 503–511.
Article Google Scholar
Gentleman R (2008). R Programming for Bioinformatics. CRC Press.
Google Scholar
Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber M, Iacus S, Irizarry R, Leisch F, Li C, Mächler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004). “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biology, 5(10), R80.
Article Google Scholar
Giuliano K, DeBiasio R, Dunlay R, Gough A, Volosky J, Zock J, Pavlakis G, Taylor D (1997). “High–Content Screening: A New Approach to Easing Key Bottlenecks in the Drug Discovery Process.” Journal of Biomolecular Screening, 2(4), 249–259.
Article Google Scholar
Goldberg D (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison–Wesley, Boston.
MATH Google Scholar
Golub G, Heath M, Wahba G (1979). “Generalized Cross–Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics, 21(2), 215–223.
Article MathSciNet MATH Google Scholar
Good P (2000). Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer.
Google Scholar
Gowen A, Downey G, Esquerre C, O’Donnell C (2010). “Preventing Over–Fitting in PLS Calibration Models of Near-Infrared (NIR) Spectroscopy Data Using Regression Coefficients.” Journal of Chemometrics, 25, 375–381.
Article Google Scholar
Graybill F (1976). Theory and Application of the Linear Model. Wadsworth & Brooks, Pacific Grove, CA.
MATH Google Scholar
Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.
Article MATH Google Scholar
Gupta S, Hanssens D, Hardie B, Kahn W, Kumar V, Lin N, Ravishanker N, Sriram S (2006). “Modeling Customer Lifetime Value.” Journal of Service Research, 9(2), 139–155.
Article Google Scholar
Guyon I, Elisseeff A (2003). “An Introduction to Variable and Feature Selection.” The Journal of Machine Learning Research, 3, 1157–1182.
MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification Using Support Vector Machines.” Machine Learning, 46(1), 389–422.
Article MATH Google Scholar
Hall M, Smith L (1997). “Feature Subset Selection: A Correlation Based Filter Approach.” International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858.
Google Scholar
Hall P, Hyndman R, Fan Y (2004). “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves.” Biometrika, 91, 743–750.
Article MathSciNet MATH Google Scholar
Hampel H, Frank R, Broich K, Teipel S, Katz R, Hardy J, Herholz K, Bokde A, Jessen F, Hoessler Y (2010). “Biomarkers for Alzheimer’s Disease: Academic, Industry and Regulatory Perspectives.” Nature Reviews Drug Discovery, 9(7), 560–574.
Article Google Scholar
Hand D, Till R (2001). “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning, 45(2), 171–186.
Article MATH Google Scholar
Hanley J, McNeil B (1982). “The Meaning and Use of the Area under a Receiver Operating (ROC) Curvel Characteristic.” Radiology, 143(1), 29–36.
Article Google Scholar
Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg.
Google Scholar
Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.
Book MATH Google Scholar
Hastie T, Pregibon D (1990). “Shrinking Trees.” Technical report, AT&T Bell Laboratories Technical Report.
Google Scholar
Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC.
Google Scholar
Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176.
Google Scholar
Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.
Article MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.
Google Scholar
Hawkins D (2004). “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences, 44(1), 1–12.
Article Google Scholar
Hawkins D, Basak S, Mills D (2003). “Assessing Model Fit by Cross–Validation.” Journal of Chemical Information and Computer Sciences, 43(2), 579–586.
Article Google Scholar
Henderson H, Velleman P (1981). “Building Multiple Regression Models Interactively.” Biometrics, pp. 391–411.
Google Scholar
Hesterberg T, Choi N, Meier L, Fraley C (2008). “Least Angle and L ₁ Penalized Regression: A Review.” Statistics Surveys, 2, 61–93.
Article MathSciNet MATH Google Scholar
Heyman R, Slep A (2001). “The Hazards of Predicting Divorce Without Cross-validation.” Journal of Marriage and the Family, 63(2), 473.
Article Google Scholar
Hill A, LaPan P, Li Y, Haney S (2007). “Impact of Image Segmentation on High–Content Screening Data Quality for SK–BR-3 Cells.” BMC Bioinformatics, 8(1), 340.
Article Google Scholar
Ho T (1998). “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 340–354.
Google Scholar
Hoerl A (1970). “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, 12(1), 55–67.
Article MathSciNet MATH Google Scholar
Holland J (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.
Google Scholar
Holland J (1992). Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA.
Google Scholar
Holmes G, Hall M, Frank E (1993). “Generating Rule Sets from Model Trees.” In “Australian Joint Conference on Artificial Intelligence,”.
Google Scholar
Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.
Article MathSciNet Google Scholar
Hothorn T, Leisch F, Zeileis A, Hornik K (2005). “The Design and Analysis of Benchmark Experiments.” Journal of Computational and Graphical Statistics, 14(3), 675–699.
Article MathSciNet Google Scholar
Hsieh W, Tang B (1998). “Applying Neural Network Models to Prediction and Data Analysis in Meteorology and Oceanography.” Bulletin of the American Meteorological Society, 79(9), 1855–1870.
Article Google Scholar
Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.
Article Google Scholar
Huang C, Chang B, Cheng D, Chang C (2012). “Feature Selection and Parameter Optimization of a Fuzzy-Based Stock Selection Model Using Genetic Algorithms.” International Journal of Fuzzy Systems, 14(1), 65–75.
MathSciNet Google Scholar
Huuskonen J (2000). “Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology.” Journal of Chemical Information and Computer Sciences, 40(3), 773–777.
Article Google Scholar
Ihaka R, Gentleman R (1996). “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics, 5(3), 299–314.
Google Scholar
Jeatrakul P, Wong K, Fung C (2010). “Classification of Imbalanced Data By Combining the Complementary Neural Network and SMOTE Algorithm.” Neural Information Processing. Models and Applications, pp. 152–159.
Google Scholar
Jerez J, Molina I, Garcia-Laencina P, Alba R, Ribelles N, Martin M, Franco L (2010). “Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem.” Artificial Intelligence in Medicine, 50, 105–115.
Article Google Scholar
John G, Kohavi R, Pfleger K (1994). “Irrelevant Features and the Subset Selection Problem.” Proceedings of the Eleventh International Conference on Machine Learning, 129, 121–129.
Google Scholar
Johnson K, Rayens W (2007). “Modern Classification Methods for Drug Discovery.” In A Dmitrienko, C Chuang-Stein, R D’Agostino (eds.), “Pharmaceutical Statistics Using SAS: A Practical Guide,” pp. 7–43. Cary, NC: SAS Institute Inc.
Google Scholar
Johnson R, Wichern D (2001). Applied Multivariate Statistical Analysis. Prentice Hall.
Google Scholar
Jolliffe I, Trendafilov N, Uddin M (2003). “A Modified Principal Component Technique Based on the lasso.” Journal of Computational and Graphical Statistics, 12(3), 531–547.
Article MathSciNet Google Scholar
Kansy M, Senner F, Gubernator K (1998). “Physiochemical High Throughput Screening: Parallel Artificial Membrane Permeation Assay in the Description of Passive Absorption Processes.” Journal of Medicinal Chemistry, 41, 1007–1010.
Article Google Scholar
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). “kernlab - An S4 Package for Kernel Methods in R.” Journal of Statistical Software, 11(9), 1–20.
Article Google Scholar
Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.
Google Scholar
Kim J, Basak J, Holtzman D (2009). “The Role of Apolipoprotein E in Alzheimer’s Disease.” Neuron, 63(3), 287–303.
Article Google Scholar
Kim JH (2009). “Estimating Classification Error Rate: Repeated Cross–Validation, Repeated Hold–Out and Bootstrap.” Computational Statistics & Data Analysis, 53(11), 3735–3745.
Article MathSciNet MATH Google Scholar
Kimball A (1957). “Errors of the Third Kind in Statistical Consulting.” Journal of the American Statistical Association, 52, 133–142.
Article Google Scholar
Kira K, Rendell L (1992). “The Feature Selection Problem: Traditional Methods and a New Algorithm.” Proceedings of the National Conference on Artificial Intelligence, pp. 129–129.
Google Scholar
Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.
Article Google Scholar
Kohavi R (1995). “A Study of Cross–Validation and Bootstrap for Accuracy Estimation and Model Selection.” International Joint Conference on Artificial Intelligence, 14, 1137–1145.
Google Scholar
Kohavi R (1996). “Scaling Up the Accuracy of Naive–Bayes Classifiers: A Decision–Tree Hybrid.” In “Proceedings of the second international conference on knowledge discovery and data mining,” volume 7.
Google Scholar
Kohonen T (1995). Self–Organizing Maps. Springer.
Google Scholar
Kononenko I (1994). “Estimating Attributes: Analysis and Extensions of Relief.” In F Bergadano, L De Raedt (eds.), “Machine Learning: ECML–94,” volume 784, pp. 171–182. Springer Berlin / Heidelberg.
Google Scholar
Kuhn M (2008). “Building Predictive Models in R Using the caret Package.” Journal of Statistical Software, 28(5).
Google Scholar
Kuhn M (2010). “The caret Package Homepage.” URL http://caret.r-forge.r-project.org/.
Kuiper S (2008). “Introduction to Multiple Regression: How Much Is Your Car Worth?” Journal of Statistics Education, 16(3).
Google Scholar
Kvålseth T (1985). “Cautionary Note About R ².” American Statistician, 39(4), 279–285.
Google Scholar
Lachiche N, Flach P (2003). “Improving Accuracy and Cost of Two–Class and Multi–Class Probabilistic Classifiers using ROC Curves.” In “Proceedings of the Twentieth International Conference on Machine Learning,” volume 20, pp. 416–424.
Google Scholar
Larose D (2006). Data Mining Methods and Models. Wiley.
Google Scholar
Lavine B, Davidson C, Moores A (2002). “Innovative Genetic Algorithms for Chemoinformatics.” Chemometrics and Intelligent Laboratory Systems, 60(1), 161–171.
Article Google Scholar
Leach A, Gillet V (2003). An Introduction to Chemoinformatics. Springer.
Google Scholar
Leisch F (2002a). “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In W Härdle, B Rönz (eds.), “Compstat 2002 — Proceedings in Computational Statistics,” pp. 575–580. Physica Verlag, Heidelberg.
Google Scholar
Leisch F (2002b). “Sweave, Part I: Mixing R and LaTeX.” R News, 2(3), 28–31.
Google Scholar
Levy S (2010). “The AI Revolution is On.” Wired.
Google Scholar
Li J, Fine JP (2008). “ROC Analysis with Multiple Classes and Multiple Tests: Methodology and Its Application in Microarray Studies.” Biostatistics, 9(3), 566–576.
Article MATH Google Scholar
Lindgren F, Geladi P, Wold S (1993). “The Kernel Algorithm for PLS.” Journal of Chemometrics, 7, 45–59.
Article Google Scholar
Ling C, Li C (1998). “Data Mining for Direct Marketing: Problems and solutions.” In “Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining,” pp. 73–79.
Google Scholar
Lipinski C, Lombardo F, Dominy B, Feeney P (1997). “Experimental and Computational Approaches To Estimate Solubility and Permeability In Drug Discovery and Development Settings.” Advanced Drug Delivery Reviews, 23, 3–25.
Article Google Scholar
Liu B (2007). Web Data Mining. Springer Berlin / Heidelberg.
Google Scholar
Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208.
Google Scholar
Lo V (2002). “The True Lift Model: A Novel Data Mining Approach To Response Modeling in Database Marketing.” ACM SIGKDD Explorations Newsletter, 4(2), 78–86.
Article Google Scholar
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.
MATH Google Scholar
Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.
MathSciNet MATH Google Scholar
Loh WY (2010). “Tree–Structured Classifiers.” Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364–369.
Article Google Scholar
Loh WY, Shih YS (1997). “Split Selection Methods for Classification Trees.” Statistica Sinica, 7, 815–840.
MathSciNet MATH Google Scholar
Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.
Article Google Scholar
Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.
Article Google Scholar
Maindonald J, Braun J (2007). Data Analysis and Graphics Using R. Cambridge University Press, 2nd edition.
Google Scholar
Mandal A, Johnson K, Wu C, Bornemeier D (2007). “Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques.” Journal of Chemical Information and Modeling, 47(3), 981–988.
Article Google Scholar
Mandal A, Wu C, Johnson K (2006). “SELC: Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms.” Technometrics, 48(2), 273–283.
Article MathSciNet Google Scholar
Martin J, Hirschberg D (1996). “Small Sample Statistics for Classification Error Rates I: Error Rate Measurements.” Department of Informatics and Computer Science Technical Report.
Google Scholar
Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012). “Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?” Journal of Chemical Information and Modeling, 52(10), 2570–2578.
Article Google Scholar
Massy W (1965). “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association, 60, 234–246.
Article Google Scholar
McCarren P, Springer C, Whitehead L (2011). “An Investigation into Pharmaceutically Relevant Mutagenicity Data and the Influence on Ames Predictive Potential.” Journal of Cheminformatics, 3(51).
Google Scholar
McClish D (1989). “Analyzing a Portion of the ROC Curve.” Medical Decision Making, 9, 190–195.
Article Google Scholar
Melssen W, Wehrens R, Buydens L (2006). “Supervised Kohonen Networks for Classification Problems.” Chemometrics and Intelligent Laboratory Systems, 83(2), 99–113.
Article Google Scholar
Mente S, Lombardo F (2005). “A Recursive–Partitioning Model for Blood–Brain Barrier Permeation.” Journal of Computer–Aided Molecular Design, 19(7), 465–481.
Article Google Scholar
Menze B, Kelm B, Splitthoff D, Koethe U, Hamprecht F (2011). “On Oblique Random Forests.” Machine Learning and Knowledge Discovery in Databases, pp. 453–469.
Google Scholar
Mevik B, Wehrens R (2007). “The pls Package: Principal Component and Partial Least Squares Regression in R.” Journal of Statistical Software, 18(2), 1–24.
Article Google Scholar
Michailidis G, de Leeuw J (1998). “The Gifi System Of Descriptive Multivariate Analysis.” Statistical Science, 13, 307–336.
Article MathSciNet MATH Google Scholar
Milborrow S (2012). Notes On the earth Package. URL http://cran.r-project.org/package=earth.
Min S, Lee J, Han I (2006). “Hybrid Genetic Algorithms and Support Vector Machines for Bankruptcy Prediction.” Expert Systems with Applications, 31(3), 652–660.
Article Google Scholar
Mitchell M (1998). An Introduction to Genetic Algorithms. MIT Press.
Google Scholar
Molinaro A (2005). “Prediction Error Estimation: A Comparison of Resampling Methods.” Bioinformatics, 21(15), 3301–3307.
Article Google Scholar
Molinaro A, Lostritto K, Van Der Laan M (2010). “partDSA: Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction.” Bioinformatics, 26(10), 1357–1363.
Article Google Scholar
Montgomery D, Runger G (1993). “Gauge Capability and Designed Experiments. Part I: Basic Methods.” Quality Engineering, 6(1), 115–135.
Article Google Scholar
Muenchen R (2009). R for SAS and SPSS Users. Springer.
Google Scholar
Myers R (1994). Classical and Modern Regression with Applications. PWS-KENT Publishing Company, Boston, MA, second edition.
Google Scholar
Myers R, Montgomery D (2009). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Wiley, New York, NY.
MATH Google Scholar
Neal R (1996). Bayesian Learning for Neural Networks. Springer-Verlag.
Google Scholar
Nelder J, Mead R (1965). “A Simplex Method for Function Minimization.” The Computer Journal, 7(4), 308–313.
Article MATH Google Scholar
Netzeva T, Worth A, Aldenberg T, Benigni R, Cronin M, Gramatica P, Jaworska J, Kahn S, Klopman G, Marchant C (2005). “Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure–Activity Relationships.” In “The Report and Recommendations of European Centre for the Validation of Alternative Methods Workshop 52,” volume 33, pp. 1–19.
Google Scholar
Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia.
Google Scholar
Olden J, Jackson D (2000). “Torturing Data for the Sake of Generality: How Valid Are Our Regression Models?” Ecoscience, 7(4), 501–510.
Google Scholar
Olsson D, Nelson L (1975). “The Nelder–Mead Simplex Procedure for Function Minimization.” Technometrics, 17(1), 45–51.
Article MATH Google Scholar
Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory.
Google Scholar
Ozuysal M, Calonder M, Lepetit V, Fua P (2010). “Fast Keypoint Recognition Using Random Ferns.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.
Article Google Scholar
Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.
Article MATH Google Scholar
Pepe MS, Longton G, Janes H (2009). “Estimation and Comparison of Receiver Operating Characteristic Curves.” Stata Journal, 9(1), 1–16.
Google Scholar
Perrone M, Cooper L (1993). “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks.” In RJ Mammone (ed.), “Artificial Neural Networks for Speech and Vision,” pp. 126–142. Chapman & Hall, London.
Google Scholar
Piersma A, Genschow E, Verhoef A, Spanjersberg M, Brown N, Brady M, Burns A, Clemann N, Seiler A, Spielmann H (2004). “Validation of the Postimplantation Rat Whole-embryo Culture Test in the International ECVAM Validation Study on Three In Vitro Embryotoxicity Tests.” Alternatives to Laboratory Animals, 32, 275–307.
Google Scholar
Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press.
Google Scholar
Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.
Article MATH Google Scholar
Provost F, Fawcett T, Kohavi R (1998). “The Case Against Accuracy Estimation for Comparing Induction Algorithms.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453.
Google Scholar
Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.
Article Google Scholar
Quinlan R (1992). “Learning with Continuous Classes.” Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343–348.
Google Scholar
Quinlan R (1993a). “Combining Instance–Based and Model–Based Learning.” Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243.
Google Scholar
Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.
Google Scholar
Quinlan R (1996a). “Bagging, Boosting, and C4.5.” In “In Proceedings of the Thirteenth National Conference on Artificial Intelligence,”.
Google Scholar
Quinlan R (1996b). “Improved use of continuous attributes in C4.5.” Journal of Artificial Intelligence Research, 4, 77–90.
Google Scholar
Quinlan R, Rivest R (1989). “Inferring Decision Trees Using the Minimum Description Length Principle.” Information and computation, 80(3), 227–248.
Article MathSciNet MATH Google Scholar
Radcliffe N, Surry P (2011). “Real–World Uplift Modelling With Significance–Based Uplift Trees.” Technical report, Stochastic Solutions.
Google Scholar
Rännar S, Lindgren F, Geladi P, Wold S (1994). “A PLS Kernel Algorithm for Data Sets with Many Variables and Fewer Objects. Part 1: Theory and Algorithm.” Journal of Chemometrics, 8, 111–125.
Google Scholar
R Development Core Team (2008). R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments. R Foundation for Statistical Computing, Vienna, Austria.
Google Scholar
R Development Core Team (2010). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Google Scholar
Reshef D, Reshef Y, Finucane H, Grossman S, McVean G, Turnbaugh P, Lander E, Mitzenmacher M, Sabeti P (2011). “Detecting Novel Associations in Large Data Sets.” Science, 334(6062), 1518–1524.
Article Google Scholar
Richardson M, Dominowska E, Ragno R (2007). “Predicting Clicks: Estimating the Click–Through Rate for New Ads.” In “Proceedings of the 16^th International Conference on the World Wide Web,” pp. 521–530.
Google Scholar
Ridgeway G (2007). “Generalized Boosted Models: A Guide to the gbm Package.” URL http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf.
Ripley B (1995). “Statistical Ideas for Selecting Network Architectures.” Neural Networks: Artificial Intelligence and Industrial Applications, pp. 183–190.
Google Scholar
Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press.
Google Scholar
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011). “pROC: an open-source package for R and S+ to analyze and compare ROC curves.” BMC Bioinformatics, 12(1), 77.
Google Scholar
Robnik-Sikonja M, Kononenko I (1997). “An Adaptation of Relief for Attribute Estimation in Regression.” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 296–304.
Google Scholar
Rodriguez M (2011). “The Failure of Predictive Modeling and Why We Follow the Herd.” Technical report, Concepcion, Martinez & Bellido.
Google Scholar
Ruczinski I, Kooperberg C, Leblanc M (2003). “Logic Regression.” Journal of Computational and Graphical Statistics, 12(3), 475–511.
Article MathSciNet MATH Google Scholar
Rumelhart D, Hinton G, Williams R (1986). “Learning Internal Representations by Error Propagation.” In “Parallel Distributed Processing: Explorations in the Microstructure of Cognition,” The MIT Press.
Google Scholar
Rzepakowski P, Jaroszewicz S (2012). “Uplift Modeling in Direct Marketing.” Journal of Telecommunications and Information Technology, 2, 43–50.
Google Scholar
Saar-Tsechansky M, Provost F (2007a). “Decision–Centric Active Learning of Binary–Outcome Models.” Information Systems Research, 18(1), 4–22.
Article MATH Google Scholar
Saar-Tsechansky M, Provost F (2007b). “Handling Missing Values When Applying Classification Models.” Journal of Machine Learning Research, 8, 1625–1657.
MATH Google Scholar
Saeys Y, Inza I, Larranaga P (2007). “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics, 23(19), 2507–2517.
Article Google Scholar
Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.
Google Scholar
Schapire YFR (1999). “Adaptive Game Playing Using Multiplicative Weights.” Games and Economic Behavior, 29, 79–103.
Article MathSciNet MATH Google Scholar
Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009). “State–of–the–Art in Parallel Computing with R.” Journal of Statistical Software, 31(1).
Google Scholar
Serneels S, Nolf ED, Espen PV (2006). “Spatial Sign Pre-processing: A Simple Way to Impart Moderate Robustness to Multivariate Estimators.” Journal of Chemical Information and Modeling, 46(3), 1402–1409.
Article Google Scholar
Shachtman N (2011). “Pentagon’s Prediction Software Didn’t Spot Egypt Unrest.” Wired.
Google Scholar
Shannon C (1948). “A Mathematical Theory of Communication.” The Bell System Technical Journal, 27(3), 379–423.
Article MathSciNet MATH Google Scholar
Siegel E (2011). “Uplift Modeling: Predictive Analytics Can’t Optimize Marketing Decisions Without It.” Technical report, Prediction Impact Inc.
Google Scholar
Simon R, Radmacher M, Dobbin K, McShane L (2003). “Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification.” Journal of the National Cancer Institute, 95(1), 14–18.
Article Google Scholar
Smola A (1996). “Regression Estimation with Support Vector Learning Machines.” Master’s thesis, Technische Universit at Munchen.
Google Scholar
Spector P (2008). Data Manipulation with R. Springer.
Google Scholar
Steyerberg E (2010). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, 1st ed. softcover of orig. ed. 2009 edition.
Google Scholar
Stone M, Brooks R (1990). “Continuum Regression: Cross-validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares, and Principal Component Regression.” Journal of the Royal Statistical Society, Series B, 52, 237–269.
MathSciNet MATH Google Scholar
Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007). “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics, 8(1), 25.
Article Google Scholar
Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.
Article MathSciNet MATH Google Scholar
Tetko I, Tanchuk V, Kasheva T, Villa A (2001). “Estimation of Aqueous Solubility of Chemical Compounds Using E–State Indices.” Journal of Chemical Information and Computer Sciences, 41(6), 1488–1493.
Article Google Scholar
Tibshirani R (1996). “Regression Shrinkage and Selection via the lasso.” Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.
MathSciNet MATH Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.
Article Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.
Article MathSciNet MATH Google Scholar
Ting K (2002). “An Instance–Weighting Method to Induce Cost–Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), 659–665.
Article Google Scholar
Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.
MathSciNet MATH Google Scholar
Titterington M (2010). “Neural Networks.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 1–8.
Article MATH Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001). “Missing Value Estimation Methods for DNA Microarrays.” Bioinformatics, 17(6), 520–525.
Article Google Scholar
Tumer K, Ghosh J (1996). “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers.” Pattern Recognition, 29(2), 341–348.
Article Google Scholar
US Commodity Futures Trading Commission and US Securities & Exchange Commission (2010). Findings Regarding the Market Events of May 6, 2010.
Google Scholar
Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.
Article MATH Google Scholar
Van Der Putten P, Van Someren M (2004). “A Bias–Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000.” Machine Learning, 57(1), 177–195.
Article MATH Google Scholar
Van Hulse J, Khoshgoftaar T, Napolitano A (2007). “Experimental Perspectives On Learning From Imbalanced Data.” In “Proceedings of the 24^th International Conference On Machine learning,” pp. 935–942.
Google Scholar
Vapnik V (2010). The Nature of Statistical Learning Theory. Springer.
Google Scholar
Varma S, Simon R (2006). “Bias in Error Estimation When Using Cross–Validation for Model Selection.” BMC Bioinformatics, 7(1), 91.
Article MathSciNet Google Scholar
Varmuza K, He P, Fang K (2003). “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science, 1, 391–404.
Google Scholar
Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer.
Google Scholar
Venables W, Smith D, the R Development Core Team (2003). An Introduction to R. R Foundation for Statistical Computing, Vienna, Austria, version 1.6.2 edition. ISBN 3-901167-55-2, URL http://www.R-project.org.
Venkatraman E (2000). “A Permutation Test to Compare Receiver Operating Characteristic Curves.” Biometrics, 56(4), 1134–1138.
Article MathSciNet MATH Google Scholar
Veropoulos K, Campbell C, Cristianini N (1999). “Controlling the Sensitivity of Support Vector Machines.” Proceedings of the International Joint Conference on Artificial Intelligence, 1999, 55–60.
Google Scholar
Verzani J (2002). “simpleR – Using R for Introductory Statistics.” URL http://www.math.csi.cuny.edu/Statistics/R/simpleR.
Wager TT, Hou X, Verhoest PR, Villalobos A (2010). “Moving Beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach To Enable Alignment of Druglike Properties.” ACS Chemical Neuroscience, 1(6), 435–449.
Article Google Scholar
Wallace C (2005). Statistical and Inductive Inference by Minimum Message Length. Springer–Verlag.
Google Scholar
Wang C, Venkatesh S (1984). “Optimal Stopping and Effective Machine Complexity in Learning.” Advances in NIPS, pp. 303–310.
Google Scholar
Wang Y, Witten I (1997). “Inducing Model Trees for Continuous Classes.” Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137.
Google Scholar
Weiss G, Provost F (2001a). “The Effect of Class Distribution on Classifier Learning: An Empirical Study.” Department of Computer Science, Rutgers University.
Google Scholar
Weiss G, Provost F (2001b). “The Effect of Class Distribution On Classifier Learning: An Empirical Study.” Technical Report ML-TR-44, Department of Computer Science, Rutgers University.
Google Scholar
Welch B (1939). “Note on Discriminant Functions.” Biometrika, 31, 218–220.
MathSciNet MATH Google Scholar
Westfall P, Young S (1993). Resampling–Based Multiple Testing: Examples and Methods for P–Value Adjustment. Wiley.
Google Scholar
Westphal C (2008). Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies. CRC Press.
Google Scholar
Whittingham M, Stephens P, Bradbury R, Freckleton R (2006). “Why Do We Still Use Stepwise Modelling in Ecology and Behaviour?” Journal of Animal Ecology, 75(5), 1182–1189.
Article Google Scholar
Willett P (1999). “Dissimilarity–Based Algorithms for Selecting Structurally Diverse Sets of Compounds.” Journal of Computational Biology, 6(3), 447–457.
Article MathSciNet Google Scholar
Williams G (2011). Data Mining with Rattle and R : The Art of Excavating Data for Knowledge Discovery. Springer.
Google Scholar
Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.
Article MathSciNet MATH Google Scholar
Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.
Article MathSciNet MATH Google Scholar
Wold H (1966). “Estimation of Principal Components and Related Models by Iterative Least Squares.” In P Krishnaiah (ed.), “Multivariate Analyses,” pp. 391–420. Academic Press, New York.
Google Scholar
Wold H (1982). “Soft Modeling: The Basic Design and Some Extensions.” In K Joreskog, H Wold (eds.), “Systems Under Indirect Observation: Causality, Structure, Prediction,” pt. 2, pp. 1–54. North–Holland, Amsterdam.
Google Scholar
Wold S (1995). “PLS for Multivariate Linear Modeling.” In H van de Waterbeemd (ed.), “Chemometric Methods in Molecular Design,” pp. 195–218. VCH, Weinheim.
Google Scholar
Wold S, Johansson M, Cocchi M (1993). “PLS–Partial Least-Squares Projections to Latent Structures.” In H Kubinyi (ed.), “3D QSAR in Drug Design,” volume 1, pp. 523–550. Kluwer Academic Publishers, The Netherlands.
Google Scholar
Wold S, Martens H, Wold H (1983). “The Multivariate Calibration Problem in Chemistry Solved by the PLS Method.” In “Proceedings from the Conference on Matrix Pencils,” Springer–Verlag, Heidelberg.
MATH Google Scholar
Wolpert D (1996). “The Lack of a priori Distinctions Between Learning Algorithms.” Neural Computation, 8(7), 1341–1390.
Article Google Scholar
Yeh I (1998). “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete research, 28(12), 1797–1808.
Article Google Scholar
Yeh I (2006). “Analysis of Strength of Concrete Using Design of Experiments and Neural Networks.” Journal of Materials in Civil Engineering, 18, 597–604.
Article Google Scholar
Youden W (1950). “Index for Rating Diagnostic Tests.” Cancer, 3(1), 32–35.
Article Google Scholar
Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann.
Google Scholar
Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.
Article MathSciNet Google Scholar
Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.
Article MathSciNet Google Scholar
Zou H, Hastie T (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67(2), 301–320.
Article MathSciNet MATH Google Scholar
Zou H, Hastie T, Tibshirani R (2004). “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics, 15, 2006.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Division of Nonclinical Statistics, Pfizer Global Research and Development, Groton, Connecticut, USA
Max Kuhn
Arbor Analytics, Saline, Michigan, USA
Kjell Johnson

Authors

Max Kuhn
View author publications
You can also search for this author in PubMed Google Scholar
Kjell Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kuhn, M., Johnson, K. (2013). A Short Tour of the Predictive Modeling Process. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_2

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6849-3_2
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6848-6
Online ISBN: 978-1-4614-6849-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics