Abstract
Recent statistical performance studies of search algorithms in difficult combinatorial problems have demonstrated the benefits of randomising and restarting the search procedure. Specifically, it has been found that if the search cost distribution of the non-restarted randomised search exhibits a slower-than-exponential decay (that is, a “heavy tail”), restarts can reduce the search cost expectation. We report on an empirical study of randomised restarted search in ILP. Our experiments conducted on a high-performance distributed computing platform provide an extensive statistical performance sample of five search algorithms operating on two principally different classes of ILP problems, one represented by an artificially generated graph problem and the other by three traditional classification benchmarks (mutagenicity, carcinogenicity, finite element mesh design). The sample allows us to (1) estimate the conditional expected value of the search cost (measured by the total number of clauses explored) given the minimum clause score required and a “cutoff” value (the number of clauses examined before the search is restarted), (2) estimate the conditional expected clause score given the cutoff value and the invested search cost, and (3) compare the performance of randomised restarted search strategies to a deterministic non-restarted search. Our findings indicate striking similarities across the five search algorithms and the four domains, in terms of the basic trends of both the statistics (1) and (2). Also, we observe that the cutoff value is critical for the performance of the search algorithm, and using its optimal value in a randomised restarted search may decrease the mean search cost (by several orders of magnitude) or increase the mean achieved score significantly with respect to that obtained with a deterministic non-restarted search.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Botta, M., Giordana, A., Saitta, L., & Sebag, M. (2003). Relational learning as search in a critical region. Journal of Machine Learning Research, (4), 431–463.
Chen, H., Gomes, C. P., & Selman, B. (2001). Formal models of heavy-tailed behavior in combinatorial search. Proceedings of the 7th international conference on principles and practice of constraint programming. Springer-Verlag, (pp. 408–421).
Domingos, P. (1999a). Process-oriented estimation of generalization Error. IJCAI99 (pp. 714–721).
Domingos, P. (1999b). The role of Occam’s Razor in knowledge discovery. Data Mining and Knowledge Discovery, 3, 409–425.
Džeroski, S. (2001). Relational data mining applications: An Overview. Relational data mining. Springer-Verlag (pp. 339–364).
Gaertner, T., Lloyd, J. W., & Flach, P. A. (2004). Kernels and distances for structured data. Machine Learning 57(3), 205–232.
Giordana, A., & Saitta, L. (2000).Phase transitions in relational learning. Machine Learning, 41(2), 217–251.
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley.
Gomes, C., & Selman B. (1999). On the fine structure of large search spaces. Proceedings the eleventh International Conference on Tools with Artificial Intelligence.
Gomes, C., Selman, P. B., Crato, N., & Kautz, H. A. (2000). Heavy-tailed phenomena in satisfiability and constraint satisfaction problems. Journal of Automated Reasoning 24(1/2), 67–100.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
Kautz, H., Horvitz, E., Ruan, Y., Gomes, C., & Selman, B. (2002). Dynamic restart policies. Proceedings of the Eighteenth National Conference on Artificial Intelligence.
Kenney, J. F., & Keeping, E. S. (1962). Harmonic mean. Van Nostrand.
Kovačič, M., Lavrač, N., Grobelnik, M., Zupančič, D., & Mladenič, D. (1992). Stochastic search in inductive logic programming. In Proceedings of the European Conference on Artificial Intelligence. (pp. 444–445).
Muggleton, S. (1995). Inverse Entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3–4), 245–286.
Pompe, U., Kononenko, & I., Makše T. (1996). An application of ILP in a musical database: Learning to compose the two-voice counterpoint. Proceedings of the MLnet Familiarization Workshop on Data Mining with Inductive Logic Programming. (pp. 1–11).
Pompe, U., Kovačič, M., & Kononenko I. (1993). SFOIL: Stochastic approach to inductive logic programming. Proceedings of the 2nd Slovenian Conference on Electrotechnical Engineering and Computer Science.
Selman, B., Levesque, H. J., & Mitchell D. (1992). A new method for solving hard satisfiability problems. In Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 440–446). AAAI Press.
Serrurier, M., Prade, H., & Richard G. (2004). A simulated annealing framework for ILP. In Proceedings of the 14th International Conference. (pp. 289–304) Springer.
Srinivasan, A., Muggleton, S., Sternberg, M. J. E., & King, R. D. (1996). Theories for mutagenicity: A study in first-order and feature-based induction’. Artificial Intelligence, 85(1–2), 277–299.
Trefethen, N. (1998). Maxims about numerical mathematics, computers, science, and life. SIAM News.
Železný, F., Srinivasan, A., & Page, D. (2003). Lattice-search runtime distributions may be heavy-tailed. In Proceedings of the 12th International Conference on Inductive Logic Programming (pp. 333–345).
Zilberstein, S. (1998). Satisficing and bounded optimality. In AAAI Spring Symposium on Satisficing Models.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Rui Camacho
Rights and permissions
About this article
Cite this article
Železný, F., Srinivasan, A. & Page, C.D. Randomised restarted search in ILP. Mach Learn 64, 183–208 (2006). https://doi.org/10.1007/s10994-006-7733-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-7733-9