Abstract
In this paper, we examine the use of machine learning to improve a rooftop detection process, one step in a vision system that recognizes buildings in overhead imagery. We review the problem of analyzing aerial images and describe an existing system that detects buildings in such images. We briefly review four algorithms that we selected to improve rooftop detection. The data sets were highly skewed and the cost of mistakes differed between the classes, so we used ROC analysis to evaluate the methods under varying error costs. We report three experiments designed to illuminate facets of applying machine learning to the image analysis task. One investigated learning with all available images to determine the best performing method. Another focused on within-image learning, in which we derived training and testing data from the same image. A final experiment addressed between-image learning, in which training and testing sets came from different images. Results suggest that useful generalization occurred when training and testing on data derived from images differing in location and in aspect. They demonstrate that under most conditions, naive Bayes exceeded the accuracy of other methods and a handcrafted classifier, the solution currently used in the building detection system.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aha, D., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Ali, K., Langley, P., Maloof, M., Sage, S.,& Binford, T. (1998). Improving rooftop detection with interactive visual learning. In Proceedings of the Image Understanding Workshop (pp. 479–492). San Francisco, CA: Morgan Kaufmann.
Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12, 387–415.
Beiden, S., Maloof, M., & Wagner, R. (2002). Analysis of competing classifiers in terms of components of variance of ROC summary accuracy measures: Generalization to a population of trainers and a population of testers. In Proceedings of the SPIE International Symposium on Medical Imaging: Image Processing (Vol. 4684).
Beiden, S., Wagner, R., & Campbell, G. (2000). Components-of-variance models and multiple-bootstrap experiments: An alternative method for random-effects Receiver Operating Characteristic analysis. Academic Radiology, 7, 341–349.
Beymer, D., & Poggio, T. (1996). Image representations for visual learning. Science, 272, 1905–1909.
Binford, T., Levitt, T., & Mann, W. (1987). Bayesian inference in model-based machine vision. In Proceedings of the Third Annual Conference on Uncertainty in Artificial Intelligence (pp. 73–97). New York, NY: Elsevier Science.
Blake, C., & Merz, C. (1998). UCI repository of machine learning databases ([http://www.ics.uci.edu/?mlearn/ MLRepository.html]). Department of Information and Computer Sciences, University of California, Irvine.
Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:7, 1145–1159.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. Boca Raton, FL: Chapman & Hall/CRC Press.
Burl, M., Asker, L., Smyth, P., Fayyad, U., Perona, P., Crumpler, L., & Aubele, J. (1998). Learning to recognize volcanoes on Venus. Machine Learning, 30, 165–194.
Cardie, C., & Howe, N. (1997). Improving minority class prediction using case-specific feature weights. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 57–65). San Francisco, CA: Morgan Kaufmann.
Chan, L., Nasrabadi, N., & Mirelli, V. (1996). Multi-stage target recognition using modular vector quantizers and multilayer perceptrons. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(pp. 114–119). Los Alamitos, CA: IEEE Press.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–284.
Conklin, D. (1993). Transformation-invariant indexing and machine discovery for computer vision. In Papers from the the AAAI Fall Symposium on Machine Learning in Computer Vision: What, Why, and How? Technical Report No. FS-93-04 (pp. 10–14). Menlo Park, CA: AAAI Press.
Connell, J., & Brady, M. (1987). Generating and generalizing models of visual objects. Artificial Intelligence, 31, 159–183.
Cook, D., Hall, L., Stark, L., & Bowyer, K. (1993). Learning combination of evidence functions in object recognition. In Papers from the the AAAI Fall Symposium on Machine Learning in Computer Vision: What, Why, and How? Technical Report No. FS-93-04 (pp. 139–143). Menlo Park, CA: AAAI Press.
Cromwell, R., & Kak, A. (1991). Automatic generation of object class descriptions using symbolic learning techniques. In Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 710–717). Menlo Park, CA: AAAI Press.
DeLong, E., DeLong, D., & Clarke-Peterson, D. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44, 837–845.
Domingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (pp. 155–164). New York, NY: ACM Press.
Dorfman, D., & Alf, E. Jr. (1969). Maximum likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—rating method data. Journal of Mathematical Psychology, 6, 487–496.
Dorfman, D., Berbaum, K., & Metz, C. (1992). Receiver Operating Characteristic rating analysis: Generalization to the population of readers and patients with the Jackknife method. Investigative Radiology, 27, 723–731.
Draper, B. (1996). Learning grouping strategies for 2D and 3D object recognition. In Proceedings of the Image Understanding Workshop (pp. 1447–1454). San Francisco, CA: Morgan Kaufmann.
Draper, B. (1997). Learning control strategies for object recognition. In K. Ikeuchi, & M. Veloso (Eds.), Symbolic Visual Learning (pp. 49–76). New York, NY: Oxford University Press.
Draper, B., Brodley, C., & Utgoff, P. (1994). Goal-directed classification using linear machine decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(9), 888–893.
Duda, R., & Hart, P. (1973). Pattern Classification and Scene Analysis. New York, NY: John Wiley & Sons.
Egan, J. (1975). Signal Detection Theory and ROC Analysis. New York, NY: Academic Press.
Ezawa, K., Singh, M.,& Norton, S. (1996). Learning goal-oriented Bayesian networks for telecommunications risk management. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 139–147). San Francisco, CA: Morgan Kaufmann.
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1, 291–316.
Fayyad, U., Smyth, P., Burl, M., & Perona, P. (1996). Learning to catalog science images. In S. Nayar, & T. Poggio (Eds.), Early Visual Learning (pp. 237–268). New York, NY: Oxford University Press.
Firschein, O., & Strat, T. (Eds.). (1997). RADIUS: Image Understanding for Imagery Intelligence. San Francisco, CA: Morgan Kaufmann.
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 148–156). San Francisco, CA: Morgan Kaufmann.
Freund, Y., Seung, H., Shamir, E., & Tishby, N. (1997). Selective sampling using the Query by Committee algorithm. Machine Learning, 28, 133–168.
Green, D., & Swets, J. (1974). Signal Detection Theory and Psychophysics. New York, NY: Robert E. Krieger Publishing.
Gros, P. (1993). Matching and clustering: Two steps towards automatic object model generation in computer vision. In Papers from the the AAAI Fall Symposium on Machine Learning in Computer Vision: What, Why, and How? Technical Report No. FS-93-04 (pp. 40–44). Menlo Park, CA: AAAI Press.
Gutta, S., Huang, J., Imam, I., & Weschler,H. (1996). Face and hand gesture recognition using hybrid classifiers. In Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (pp. 164–169). Los Alamitos, CA: IEEE Press.
Hand, D., & Till, R. (2001).A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45, 171–186.
Hanley, J., & McNeil, B. (1982). The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology, 143, 29–36.
Hinkley, D. (1983). Jackknife methods. In S. Kotz, N. Johnson, & C. Read (Eds.), Encyclopedia of Statistical Sciences (Vol. 4, pp. 280–287). New York, NY: John Wiley & Sons.
John, G., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). San Francisco, CA: Morgan Kaufmann.
Keppel, G., Saufley, W., & Tokunaga, H. (1992). Introduction to Design and Analysis, 2nd edn. New York, NY: W.H. Freeman.
Kim, Z., & Nevatia, R. (1999). Uncertain reasoning and learning for feature grouping. Computer Vision and Image Understanding, 76, 278–288.
Kim, Z., & Nevatia, R. (2000). Learning Bayesian networks for diverse and varying numbers of evidence sets. In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 479–486). San Francisco, CA: Morgan Kaufmann.
Kubat, M., Holte, R., & Matwin, S. (1996). Learning when negative examples abound. In Proceedings of the 1997 European Conference on Machine Learning (pp. 146–153). Berlin: Springer-Verlag.
Kubat, M., Holte, R., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite images. Machine Learning, 30, 195–215.
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 179–186). San Francisco, CA: Morgan Kaufmann.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). Menlo Park, CA: AAAI Press.
Langley, P., & Simon, H. (1995). Applications of machine learning and rule induction. Communications of the ACM, 38, 54–64.
Levitt, T., Agosta, J., & Binford, T. (1989). Model-based influence diagrams for machine vision. In Proceedings of the Fifth Annual Conference on Uncertainty in Artificial Intelligence (pp. 371–388). New York, NY: Elsevier Science.
Lewis, D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 148–156). San Francisco, CA: Morgan Kaufmann.
Lin, C., & Nevatia, R. (1998). Building detection and description from a single intensity image. Computer Vision and Image Understanding, 72, 101–121.
Maloof, M. (2000). An initial study of an adaptive hierarchical vision system. In Proceedings of the Seventeenth International Conference on Machine Learning (pp. 567–573). San Francisco, CA: Morgan Kaufmann.
Maloof, M. (2002). On machine learning, ROC analysis, and statistical tests of significance. In Proceedings of the Sixteenth International Conference on Pattern Recognition. Los Alamitos, CA: IEEE Press.
Maloof, M., Beiden, S., & Wagner, R. (2002). Analysis of competing classifiers in terms of components of variance of ROC accuracy measures.Technical Report No. CS-02-01.Washington, DC: Department of Computer Science, Georgetown University. (http://www.cs.georgetown.edu/?maloof/pubs/cstr-02-01.html)
Maloof, M., Duric, Z., Michalski, R., & Rosenfeld, A. (1996). Recognizing blasting caps in X-ray images. In Proceedings of the Image Understanding Workshop (pp. 1257–1261). San Francisco, CA: Morgan Kaufmann.
Maloof, M., Langley, P., Binford, T., & Nevatia, R. (1998). Generalizing over aspect and location for rooftop detection. In Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision (pp. 194–199). Los Alamitos, CA: IEEE Press.
Maloof, M., Langley, P., Binford, T., & Sage, S. (1998). Learning to detect rooftops in overhead imagery. Technical Report No. 98-1. Palo Alto, CA: Institute for the Study of Learning and Expertise.
Maloof, M., Langley, P., Sage, S., & Binford, T. (1997). Learning to detect rooftops in aerial images. In Proceedings of the Image Understanding Workshop (pp. 835–845). San Francisco, CA: Morgan Kaufmann.
Maloof, M., & Michalski, R. (1997). Learning symbolic descriptions of shape for object recognition in X-ray images. Expert Systems with Applications, 12, 11–20.
Metz, C. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine, VIII:4, 283–298.
Metz, C. (1989). Some practical issues of experimental design and data analysis in radiological ROC studies. Investigative Radiology, 24, 234–245.
Michalski, R., Mozetic, I., Hong, J., & Lavrac, H. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 1041–1045). Menlo Park, CA: AAAI Press.
Miller, D., & Uyar, H. (1997). A mixture of experts classifier with learning based on both labeled and unlabeled data. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems (Vol. 9). Cambridge, MA: MIT Press.
Mohan, R., & Nevatia, R. (1989). Using perceptual organization to extract 3-D structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 1121–1139.
Mossman, D. (1999). Three-way ROCs. Medical Decision Making, 19, 78–89.
Nayar, S., & Poggio, T. (Eds.). (1996). Early Visual Learning. New York, NY: Oxford University Press.
Noronha, S., & Nevatia, R. (1997). Detection and description of buildings from multiple aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 588–594). Los Alamitos, CA: IEEE Press.
Osuna, E., Freund, R., & Girosi, F. (1997). Training Support Vector Machines: An application to face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 130–136). Los Alamitos, CA: IEEE Press.
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 217–225). San Francisco, CA: Morgan Kaufmann.
Pomerleau, D. (1996). Neural network vision for robot driving. In S. Nayar, & T. Poggio (Eds.), Early visual learning (pp. 161–181). New York, NY: Oxford University Press.
Pope, A., & Lowe, D. (1996). Learning probabilistic appearance models for object recognition. In S. Nayar, & T. Poggio (Eds.), Early Visual Learning (pp. 67–97). New York, NY: Oxford University Press.
Pope, A., & Lowe, D. (2000). Probabilistic models of appearance for 3-D object recognition. International Journal of Computer Vision, 40, 149–167.
Provan, G., Langley, P., & Binford, T. (1996). Probabilistic learning of three-dimensional object models. In Proceedings of the Image Understanding Workshop (pp. 1403–1413). San Francisco, CA: Morgan Kaufmann.
Provost, F., & Fawcett,T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 43–48). Menlo Park, CA: AAAI Press.
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 445–453). San Francisco, CA: Morgan Kaufmann.
Quinlan, J. (1993). C4.5: Programs for Machine Learning. San Francisco, CA: Morgan Kaufmann.
Rowley, H., Baluja, S., & Kanade, T. (1996). Neural network-based face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 203–208). Los Alamitos, CA: IEEE Press.
Sarkar, S., & Soundararajan, P. (2000). Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:5, 504–525.
Segen, J. (1994). GEST: A learning computer vision system that recognizes hand gestures. In R. Michalski, & G. Tecuci (Eds.), Machine Learning: A Multistrategy Approach (Vol. 4, pp. 621–634). San Francisco, CA: Morgan Kaufmann.
Sengupta, K., & Boyer, K. (1993). Incremental model base updating: Learning new model sites. In Papers from the the AAAI Fall Symposium on Machine Learning in Computer Vision: What, Why, and How? Technical Report No. FS-93-04 (pp. 1–5). Menlo Park, CA: AAAI Press.
Shepherd, B. (1983). An appraisal of a decision tree approach to image classification. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence (pp. 473–475). San Francisco, CA: Morgan Kaufmann.
Soderland, S., & Lehnert, W. (1994). Corpus-driven knowledge acquisition for discourse analysis. In Proceedings of the Twelfth National Conference on Artificial Intelligence (pp. 827–832). Menlo Park, CA: AAAI Press.
Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.
Swets, J., & Pickett, R. (1982). Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. New York, NY: Academic Press.
Teller, A., & Veloso, M. (1997). PADO: A new learning architecture for object recognition. In K. Ikeuchi, & M. Veloso (Eds.), Symbolic Visual Learning (pp. 77–112). New York, NY: Oxford University Press.
Thompson, M., & Zucchini, W. (1986). On the statistical analysis of ROC curves. Statistics in Medicine, 18, 452–462.
Turney, P. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2, 369–409.
Viola, P. (1993). Feature-based recognition of objects. In Papers from the the AAAI Fall Symposium on Machine Learning in Computer Vision: What, Why, and How? Technical Report No. FS-93-04 (pp. 60–64). Menlo Park, CA: AAAI Press.
Wagner, R., Beiden, S., & Metz, C. (2001). Continuous versus categorical data for ROC analysis: Some quantitative considerations. Academic Radiology, 8, 328–334.
Walpole, R., Myers, R., & Myers, S. (1998). Probability and Statistics for Engineers and Scientists, 6th edn. Upper Saddle River, NJ: Prentice-Hall.
Wolpert, D. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Woods, K., Cook, D., Hall, L., Bowyer, K., & Stark, L. (1995). Learning membership functions in a function-based object recognition system. Journal of Artificial Intelligence Research, 3, 187–222.
Woods, K., Kegelmeyer, W., & Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:4, 405–410.
Zurada, J. (1992). Introduction to Artificial Neural Systems. St. Paul, MN: West Publishing.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Maloof, M., Langley, P., Binford, T. et al. Improved Rooftop Detection in Aerial Images with Machine Learning. Machine Learning 53, 157–191 (2003). https://doi.org/10.1023/A:1025623527461
Issue Date:
DOI: https://doi.org/10.1023/A:1025623527461