Abstract
There is a need to strike a balance between the pursuit of personalized services based on a fine-grained behavioral analysis and the user privacy concerns. In this paper, we consider the use of web traces with truncated URLs, where each URL is trimmed to only contain the web domain, to remove sensitive user information. In order to offset the accuracy loss in user activity profiling due to URL truncation, we propose a statistical methodology that leverages specialized features extracted from a burst of consecutive URLs representing a micro user action. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. On a real dataset of mobile web traces, consisting of more than 130 million records and 10,000 users, we show that our methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs.
Chapter PDF
Similar content being viewed by others
References
European Communities (Electronic Communications Networks and Services) (Privacy and Electronic Communications) Regulations (2011). http://dataprotection.ie/documents/guidance/Electronic_Communications_Guidance.pdf
Alexa: Actionable Analytics for the Web (2015). http://www.alexa.com
Nandi, A., Aghasaryan, A., Bouzid, M.: P3: a privacy preserving personalization middleware for recommendation-based services. In: Proceedings of 4th Hot Topics in Privacy Enhancing Technologies Symposium (HotPETS 2011) (2011)
BBC (2014). http://www.bbc.com/news/technology-25825690 (accessed November 2014)
Bilenko, M., Richardson, M.: Predictive client-side profiles for personalized advertising. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 413–421. ACM (2011)
Chen, Y., Pavlov, D., Canny, J.F.: Large-scale behavioral targeting. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 209–218. ACM (2009)
European Data Protection Supervisor (2011). https://secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Consultation/Opinions/2011/11-05-30_Evaluation_Report_DRD_EN.pdf (accessed November 2014)
Facebook: Facebook and the Irish Data Protection Commission (2011). https://www.facebook.com/notes/facebook-public-policy-europe/facebook-and-the-irish-data-protection-commission/288934714486394 (accessed November 2014)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Karagiannis, T., Molle, M., Faloutsos, M., Broido, A.: A nonstationary poisson view of internet traffic. In: INFOCOM (2004)
Kleinberg, J.M.: Bursty and hierarchical structure in streams. In: KDD, pp. 91–101 (2002)
Li, F., Sun, J., Papadimitriou, S., Mihaila, G.A., Stanoi, I.: Hiding in the crowd: privacy preservation on evolving streams through correlation tracking. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE, pp. 686–695. IEEE (2007)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. TKDD 1(1) (2007)
Mai, T., Ajwani, D., Sala, A.: Profiling user activities with minimal traffic traces (2015). ArXiv e-prints
Nguyen, T.T.T., Armitage, G.J.: A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys and Tutorials 10(1–4), 56–76 (2008)
Song, J., Lee, S., Kim, J.: I know the shortened urls you clicked on twitter: inference attack using public click analytics and twitter metadata. In: Proceedings of the 22Nd International Conference on World Wide Web, pp. 1191–1200. WWW 2013, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2013). http://dl.acm.org/citation.cfm?id=2488388.2488492
Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
TechCrunch (2015). http://techcrunch.com/2014/10/01/hamburg-google/ (accessed March 2015)
Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum, H., Barocas, S.: Adnostic: privacy preserving targeted advertising. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2010. The Internet Society (2010)
Wood, S.N.: Generalized additive models: an introduction with R. Chapman and Hall/CRC Texts in Statistical Science Series. Chapman and Hall/CRC Press (2006)
Xu, Y., Wang, K., Zhang, B., Chen, Z.: Privacy-enhancing personalized web search. In: Proceedings of the 16th International Conference on World Wide Web, WWW, pp. 591–600. ACM (2007)
Yan, J., Liu, N., Wang, G., Zhang, W., Jiang, Y., Chen, Z.: How much can behavioral targeting help online advertising? In: Proceedings of the 18th International Conference on World Wide Web, WWW, pp. 261–270. ACM (2009)
Zeltser, L. (2014). http://zeltser.com/combating-malicious-software/malicious-ip-blocklists.html
Zhang, F., He, W., Liu, X., Bridges, P.G.: Inferring users’ online activities through traffic analysis. In: Proceedings of the Fourth ACM Conference on Wireless Network Security, pp. 59–70. WiSec 2011. ACM (2011)
Zhang, J., Xiang, Y., Wang, Y., Zhou, W., Xiang, Y., Guan, Y.: Network traffic classification using correlation information. IEEE Trans. Parallel Distrib. Syst. 24(1), 104–117 (2013)
Zuckerberg, M.: Our commitment to the facebook community (2011). https://www.facebook.com/notes/facebook/our-commitment-to-the-facebook-community/10150378701937131 (accessed November 2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mai, T., Ajwani, D., Sala, A. (2015). Profiling User Activities with Minimal Traffic Traces. In: Cimiano, P., Frasincar, F., Houben, GJ., Schwabe, D. (eds) Engineering the Web in the Big Data Era. ICWE 2015. Lecture Notes in Computer Science(), vol 9114. Springer, Cham. https://doi.org/10.1007/978-3-319-19890-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-19890-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19889-7
Online ISBN: 978-3-319-19890-3
eBook Packages: Computer ScienceComputer Science (R0)