Client-Friendly Classification over Random Hyperplane Hashes

Rajaram, Shyamsundar; Scholz, Martin

doi:10.1007/978-3-540-87481-2_17

Shyamsundar Rajaram¹ &
Martin Scholz¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5212))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5731 Accesses
1 Citations
5 Altmetric

Abstract

In this work, we introduce a powerful and general feature representation based on a locality sensitive hash scheme called random hyperplane hashing. We are addressing the problem of centrally learning (linear) classification models from data that is distributed on a number of clients, and subsequently deploying these models on the same clients. Our main goal is to balance the accuracy of individual classifiers and different kinds of costs related to their deployment, including communication costs and computational complexity. We hence systematically study how well schemes for sparse high-dimensional data adapt to the much denser representations gained by random hyperplane hashing, how much data has to be transmitted to preserve enough of the semantics of each document, and how the representations affect the overall computational complexity. This paper provides theoretical results in the form of error bounds and margin based bounds to analyze the performance of classifiers learnt over the hash-based representation. We also present empirical evidence to illustrate the attractive properties of random hyperplane hashing over the conventional baseline representation of bag of words with and without feature selection.

Download to read the full chapter text

Chapter PDF

Communication-Free Widened Learning of Bayesian Network Classifiers Using Hashed Fiedler Vectors

Reference Point Hyperplane Trees

Learning Binary Hash Codes for Large-Scale Image Search

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Achlioptas, D.: Database-friendly random projections. In: Symposium on Principles of Database Systems (PODS 2001), pp. 274–281. ACM Press, New York (2001)
Google Scholar
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Symposium on Foundations of Computer Science (FOCS 2006), pp. 459–468. IEEE Computer Society, Los Alamitos (2006)
Chapter Google Scholar
Arriaga, R.I., Vempala, S.: An algorithmic theory of learning: Robust concepts and random projection. In: IEEE Symposium on Foundations of Computer Science, pp. 616–623 (1999)
Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), pp. 245–250. ACM Press, New York (2001)
Chapter Google Scholar
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Symposium on Theory of computing (STOC 2002), pp. 380–388. ACM Press, New York (2002)
Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Symposium on Computational geometry (SCG 2004), pp. 253–262. ACM Press, New York (2004)
Chapter Google Scholar
Eshghi, K., Rajaram, S.: Locality-sensitive hash functions based on concommitant rank order statistics. In: Int. Conf. on Knowledge discovery and data mining (KDD 2008). ACM Press, New York (2008)
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research (JMLR) (3), 1289–1305 (2003)
Article MATH Google Scholar
Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Int. Conf. on Knowledge discovery and data mining (KDD 2003), pp. 517–522. ACM Press, New York (2003)
Chapter Google Scholar
Goel, N., Bebis, G., Nefian, A.: Face recognition experiments with random projection. In: SPIE, Bellingham, WA, pp. 426–437 (2005)
Google Scholar
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)
Article MATH MathSciNet Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Symposium on Theory of computing (STOC 1998), pp. 604–613 (1998)
Google Scholar
Johnson, W., Lindenstrauss, J.: Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics 26, 189–206 (1984)
MATH MathSciNet Google Scholar
Kumar, K., Bhattacharya, C., Hariharan, R.: A randomized algorithm for large scale support vector learning. In: Advances in Neural Information Processing Systems (NIPS 2007), pp. 793–800. MIT Press, Cambridge (2008)
Google Scholar
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid prototyping for complex data mining tasks. In: Int. Conf. on Knowledge discovery and data mining (KDD 2006). ACM Press, New York (2006)
Google Scholar
Ravichandran, D., Pantel, P., Hovy, E.: Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering. In: Association for Computational Linguistics (ACL 2005), pp. 622–629 (2005)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Singh, K., Ma, M., Park, D.W.: A content-based image retrieval using FFT & cosine similarity coefficient. Signal and Image Processing (2003)
Google Scholar
Vempala, S.: The Random Projection Method. American Mathematical Society (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Hewlett Packard Laboratories, , 1501 Page Mill Road, Palo Alto, CA,
Shyamsundar Rajaram & Martin Scholz

Authors

Shyamsundar Rajaram
View author publications
You can also search for this author in PubMed Google Scholar
Martin Scholz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Walter Daelemans Bart Goethals Katharina Morik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajaram, S., Scholz, M. (2008). Client-Friendly Classification over Random Hyperplane Hashes. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87481-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-87481-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87480-5
Online ISBN: 978-3-540-87481-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Client-Friendly Classification over Random Hyperplane Hashes

Abstract

Chapter PDF

Similar content being viewed by others

Communication-Free Widened Learning of Bayesian Network Classifiers Using Hashed Fiedler Vectors

Reference Point Hyperplane Trees

Learning Binary Hash Codes for Large-Scale Image Search

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Client-Friendly Classification over Random Hyperplane Hashes

Abstract

Chapter PDF

Similar content being viewed by others

Communication-Free Widened Learning of Bayesian Network Classifiers Using Hashed Fiedler Vectors

Reference Point Hyperplane Trees

Learning Binary Hash Codes for Large-Scale Image Search

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation