Abstract
Attackers mostly target users with vulnerable browsers thus inducting client side attacks through various exploitation means, where dynamic client-side JavaScript is most instrumental. In this paper, we present UAC (URL Analyzer and Classifier), a novel lightweight and browser-independent solution that leverages static analysis combined with run-time emulation to identify malicious web pages. UAC performs multi-facet inspection of web page which includes DOM parsing to identify suspicious DOM elements including hidden iframes and malicious links, JavaScript analysis to detect obfuscated and malicious behavior using function-call profiling based on supervised learning, tracking dynamic domain redirections and scanning for suspicious patterns. An Active potential URL hunt to seed web pages is conducted using an integrated web crawler to cover the maximum cyber space for a given URL. The solution is employed as a Low Interaction Honeyclient in a Distributed Honeynet System where the scalability is addressed using a hash-based redundancy check.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Static analysis for malicious websites
- Run-time emulation
- Low-interaction Honeyclient
- Web crawler
- Obfuscated and malicious JavaScript
- Suspicious DOM elements
- Signature scanning
- Machine learning
1 Introduction
Internet has become the most popular medium of communication and global information reservoir. With the increasing popularity of public social networking sites, the whole universe seems to congregate around internet to get his/her share of web. Though the general impression is the growing cyber security awareness among the masses, but the advanced hacker techniques and sophistication seems to counter the defensive mechanisms easily and befool the users.
Malicious web contents today primarily target web clients with browser vulnerabilities. Particularly, Drive-by-downloads [1] are specific types of web based client-side attacks in which a web browser requests web pages from remote web server. As a response, the server returns a webpage to the browser that contains attack code to exploit the web browser’s remote code execution vulnerability. If the malware is not delivered as part of the attack code’s payload, a special payload called downloader can optionally first pull and then execute malware on the local workstation. The entire attack happens without the users consent or notice. These attacks normally take advantage of tight coupling of browser plug-ins with browser environment. The memory of browser is physically shared with its various extensions thus making it highly susceptible to heap spray [2] or other similar attacks. The deterministic heap behavior causes the attacker to reliably assume the complete control of browser memory and eventually the entire system.
Detection domain of malicious websites primarily focuses on following strategies:
-
(a)
Browser Built-in Protection
Browser Protection Plug-ins [3], Safe-Browsing like Google [4]
-
(b)
Static and Machine Learning Approaches
JavaScript Features [5, 6], HTML and URL Structural Processing [6] and HTTP Communication Patterns [7], Pattern-Matching [8]
-
(c)
Memory Monitoring
Memory Corruption and Heap Spray Detection [9], Data Memory Protection [10]
-
(d)
Emulation-Based Mitigation Technique
Browser Emulation with HTTP response verification, Sandboxing the Script Execution and Result Verification [11]
-
(e)
Impact Learning
Monitoring downloaded content correlated with User Events [12], Un-consented Content Execution Prevention [13].
-
(f)
HoneyClients
UAC (URL Analyzer and Classifier) is a lightweight solution that leverages static analysis combined with run-time emulation to identify malicious web pages. It performs inspection of a web page from multiple dimensions, which includes DOM parsing to identify potentially suspicious DOM elements including hidden iframes and malicious links, JavaScript analysis to detect obfuscated and malicious behavior, dynamic domain redirection tracking and scanning for suspicious patterns. UAC has the following features to offer:
Hybrid Analysis Framework. UAC offers hybrid analysis capability to counter the hidden techniques employed by Blackhat and to cover reasonable analysis domain. Run-time emulation facilitates safe inspection environment and exposure of dynamic behavior whereas employment of static analysis offers fast investigation.
Light-weight Approach. It has been tested with respect to system and performance measurements and has proved to incur less overhead. It demands minimum system resources and take around 20 s for each analysis.
Supervised Learning-based model. The JavaScript analysis and its behavioral profiling are based on supervised learning models to deliver accurate results.
Distributed Deployment. The solution has been deployed as a Low Interaction Honeyclient at various geographical locations to permit distributed load balancing and capturing of targeted attacks (Region-specific attacks).
Scalable Solution. The hash-based technique to eliminate the process of redundant URL analysis has been integrated. Also, the architectural implementation ensures that the analysis is done at client-side and the analysis results are mapped to central server which reduces transmission load and also consumes less network bandwidth.
Evaluated Version. It has been evaluated against various open-source Low Interaction Honeyclients and also with Google–Safe browsing. The results depict that UAC is very effective in detecting malicious URLs with a very low false positive rate of 0.2 % and false negative rate of 0.08 %.
2 Related Work
Caffeine Monkey [19] is a Client-Side Honeypot technology to identify browser exploitation. It employs a JavaScript de-obfuscator, logger, and profiler to identify malicious websites. JavaScript behavioral analysis is based on its function-call analysis. Whereas the common aspect of Caffeine Monkey and UAC is the use of function calls for JavaScript analysis, the significant difference lies in the selection of function calls. UAC makes use of 33 JavaScript function calls, which have been selected after rigorous experimentations on various websites that download malware.
Binspect [20] makes use of emulation and static analysis to detect Drive-by-Download and phishing attacks. It employs machine learning models based on URL features, Page-Source features (HTML and JavaScript), and Social-Reputation features. UAC however analyzes the web page from the behavioral features rather than structural features for more accurate interpretation.
ZOZZLE [21], a fast and precise in-browser JavaScript Malware Detection is based on static JS analysis using function-call hooking in browser JS engine. Bayesian classification of hierarchical features in the form of JavaScript abstract syntax tree is used to identify syntax elements that are highly predictive of malware. However, it primarily addresses No-op and heap spray attacks. The obfuscation detection of JavaScript in UAC is primarily derived from “Automatic Detection for JavaScript Obfuscation Attacks in Web Pages through String Pattern Analysis” [22] that makes use of n-grams, entropy and string length to identify obfuscation in scripts.
Jstill [23] enables detection of obfuscated JavaScript and function invocation based analysis to detect malicious JavaScript. It also highlights the discrepancies of browser-based mechanisms. However, the analysis is based on inspecting arguments of function calls that are dynamically invoked. UAC, on the other hand makes use of the statistical and sequential features inherent in function call invocation, where obfuscation detection is done in a separate thread.
“Knowing your enemy: understanding and detecting malicious web advertising” [24] has developed Mad Tracer for Spam, Drive-By-Downloads, and Click Frauds. It analyzes hidden iFrame injections and redirections. UAC also provides information of iFrames and malicious links but it identifies all iFrame and analyzes them according to their visibility index and structure. In addition, it also identifies suspicious links on a web page.
3 Problem Definition and Approach Adopted
Being a type of client-side attack, detection of Drive-By-Download attacks needs to be addressed at client-side. The problem statement can be stated as the development of Client Honeypot for (a) Overcoming the challenge of multiple browser-OS combinations to detect actual system exploit (b) Capturing static and dynamic webpage contents (c) Inspection of dynamic JavaScript behavior to detect mal-code and/or redirections (d) Large-scale deployment of the analysis mechanism which demands a low-overhead and fast approach, in addition to addressing scalability.
3.1 Approach Adopted
To address the above problem statement, UAC has been developed which employs emulated browser and JavaScript engine that facilitates the execution of URLs and JavaScripts in safe emulated environment without the need to configure browser-specific environment. Use of emulation enables the capturing of static and run-time (dynamically) generated web contents including potentially malicious iframes and links. Use of JavaScript engine enables the inspection of dynamic JavaScript behavior thus defeating the mechanisms of obfuscation and other code-hiding techniques used by attackers. Various Challenges and their solutions provide an overview of the approach adopted:
3.2 Challenge 1: Overcoming the Challenge of Multiple Browser-OS Combinations to Detect Actual System Exploit
UAC is a browser-independent solution that utilizes emulated browser and JavaScript engine to facilitate the execution of URLs and JavaScripts in a safe emulated environment (protected from self-exploitation) without the need to configure browser-specific environment.
3.3 Challenge 2: Capturing Available and Generated (Static and Dynamic) Webpage Contents
Execution of URL using browser that is configured with DOM parser and JavaScript engine permits monitoring of static and run-time web contents including likely malicious iframes, links, and invoked scripts.
3.4 Challenge 3: Transient Malware Compromises Effectiveness of Static Analysis
Transient JavaScript malware can be effectively monitored during run-time where it renders its actual behavior. Hybrid analysis technique (static and run-time) is employed in UAC that exposes the dynamic behavior of webpage.
3.5 Challenge 4: Inspection of Dynamic JavaScript Behavior to Detect Mal-code and/or Redirections
Use of JavaScript engine in UAC enables the inspection of dynamic JavaScript behavior thus defeating the mechanisms of obfuscation and other code-hiding techniques used by attackers.
3.6 Challenge 5: Establish Significant (Legitimate and Illegitimate) JS Function-calls
Thirty three JavaScript function calls have been selected after rigorous experimentations (using commercial sandbox) on JavaScripts extracted from sites that drop malware. These function calls exhibit the most frequent occurrences in suspicious web sites.
3.7 Challenge 6: Scalability Aspects
Hash-based redundancy check has been applied in UAC to prevent redundant URL analysis.
4 UAC Design
Figure 1 illustrates the design of UAC in which the input is a set of seed URLs which are further crawled and then analyzed. The input URLs are executed using emulated browser and relevant parameters are captured. UAC declares any site as “Likely Suspicious”, “Suspicious”, “Highly Suspicious”, “Benign”, and “Error”. This classification is based on final rule-set generated after URL analysis.
4.1 URL Active Crawling
The active URL hunt is done using a web crawler that extracts web links from a given web-page. URL crawling pursues standard algorithm that downloads website contents and extract links based on recognized patterns.
An important challenge in the implementation of web crawler is the selection of an optimum crawling depth. If depth is too low, associated crawling becomes limited to few sites. Large crawling depth produces an enormous overhead and becomes the bottleneck in the whole analysis process. Table 1 summarizes the output of various experimentations that were carried out to select the most suitable depth value. The processing overhead incurred by web crawler on system can be averaged as:
-
Time Consumption: 0.033 s/URL (Average)
-
Memory Consumption: 7.86 kb/URL (Average)
From the table it can be concluded that Depth Value of 2 maintains a balance between detection rate and processing overhead. However, user is provided with an option to select crawling depth between 0 and 3, according to his needs during analysis.
4.2 Hash-Based Redundancy Checker
UAC is implemented as a distributed system i.e. deployed at various geographical locations to capture location-specific attacks and to enable load distributions during peak operations. To scale the system, the initial URL seeding is implemented in the form of hash structures to prevent redundant URL search. Major DOM elements like <a>, <base>, <body>, <button>, <command>, <datalist>, <div>, <embed>, <form>, <iframe>, <li>, <link>, <object>, <source>, <internal script>, <external script>, <asynchronous script>, etc. are parsed as shown in Fig. 2.
These DOM elements have been cataloged based on dynamicity and impact that these exhibit on any website. These values are then converted into hash structure in the form of a string key value. The hash map data structure directly maps a given key (extracted after parsing the DOM structure of site) to classification if it has been previously analyzed (and so no need of further analysis). If no matching key is found, the hash table is updated with the new generated key. The updated hash table is mapped to each distributed location on a regular basis.
4.3 Hybrid Analysis Mechanism
In order to capture the actual behavior of the website, it is recommended that the site be executed in emulated browser, if not real one. This enables us to capture the run-time behavioral aspects of URL. For this, e-links text browser [25] has been deployed which is an open-source terminal emulator. The browser is further configured with SpiderMonkey [26] JavaScript engine which is responsible for rendering and exposing component object model for JavaScripts. However, the browser and JavaScript engine functionality is utilized only to extract relevant analysis parameters to be later evaluated as shown in Fig. 3.
4.4 DOM Parsing to Detect Suspicious DOM Elements
The DOM parser, as shown in Fig. 4, monitors all the website components that become part of DOM during URL execution. The DOM of any website defines the complete structure of the site. DOM elements may exist statically or may be generated dynamically. DOM parser scans for following suspicious elements.
-
1.
Potentially Malicious iFrames
Iframes add redirections to any site and these iframes are either present as static DOM elements on compromised sites or as dynamic DOM elements through malicious dynamic script injections. Following iframes are considered to be potentially suspicious and are extracted:
-
Hidden iframes (with visibility index ranging from 0 to 2)
-
Likely Malicious Iframes of the form http://foreigndomain.com/location/resource_id=? which are normally involved in delivering information to third parties or as a means of exchanging some kind of identification.
-
-
2.
Potentially Malicious Links
-
Links containing executable file extensions like .exe or .dll etc. that lead to binary drop on system.
-
Links of the form http://foreigndomain.com/location/resource_id=? which are potentially suspicious because of the reasons stated above. All these links are initially filtered based on a Whitelist (top rated benign sites) and then populated to database as potentially suspicious links.
-
4.5 JavaScript Analysis
JavaScripts add dynamicity to a website because they are dynamically executed by the browser at the time of URL visit. Browsers are generally incorporated with a JavaScript engine that renders the code for a site. Due to their dynamic nature, JavaScripts are responsible for more than 80 % of web attacks that involve client-side exploitation. Hence, they form critical part of web contents to be analyzed exhaustively. Following analysis is performed on the JavaScripts extracted from site:
-
1.
Obfuscation Detection
Obfuscation is the means of hiding the actual intent of the script through application of techniques that encrypt the plain-text. This detection is significant since most of the malicious scripts are obfuscated to easily evade signature detection or even manual analysis. Figure 5 depicts an obfuscated script received during analysis.
The obfuscation detection is based on following parameters:
-
(a)
N-grams Mining
-
1-gram distribution is computed for each of following characters in JavaScripts:
-
normal characters (u and x)
-
numeric characters (0–9)
-
special symbols (@,#,$,%, etc.)
-
-
There exists a high density concentration of the above characters in obfuscated scripts and hence their frequency distribution is useful.
-
-
(b)
Entropy
-
The arguments of significant JavaScript function calls (found in malicious JavaScript) are captured and their entropy is calculated. Entropy is an indication for the information gain. The use of obfuscated strings greatly reduces the entropy and hence entropy calculation is important. Entropy is calculated based on Shannon entropy concept [27] with the following formula:
$$E(B) = - \sum\limits_{i = 1}^{N} {\left( {\frac{{b_{i} }}{T}} \right)} \log \left( {\frac{{b_{i} }}{T}} \right)\left\{ {\begin{array}{*{20}l} {B = \{ b_{i} ,\quad i = 0,1 \ldots N\} } \hfill \\ {T = \sum\limits_{i = 1}^{N} {b_{i} } } \hfill \\ \end{array} } \right.$$
-
-
(c)
Entropy Density
-
Entropy density is an important parameter since only entropy sometimes may not be able to provide complete information. The distribution of the entropy over the whole range of input bytes is significant and hence the entropy density is calculated based on:
$${\text{Entropy}}\;{\text{Density}} = {\text{Entropy}}/{\text{String}}\;{\text{length}}$$
-
-
(d)
Longest Word Length
-
Obfuscated strings generally utilize larger lengths because they have larger hexadecimal (or otherwise) distribution to represent any single character. All the above parameters are extracted and compared against machine-learned model. The model has been generated after due training using both benign and malicious samples. Trees-Random forest [28] is the learning algorithm employed in UAC which has been selected after intensive experimentations on the dataset using various learning algorithms. The selected algorithm provides least false positives and false negatives (as depicted in confusion matrix) during training. Table 2 provides an overview of the criteria used for selection of machine learning algorithms for various analysis mechanisms.
-
-
(a)
-
2.
JavaScript Behavioral Profiling
Obfuscation is just an indication of the malicious intent. However, the actual behavior still remains to be identified. The behavioral profiling of the JavaScript is done based on significant function (API) calls. Thirty three significant function calls have been selected after excessive experimentations on all those sites that drop malware (the malware drop declared using commercial sandbox analysis), which primarily include eval, unescape, concatstring, undependstring, execute, setproperty, and so on. Also the function calls selected from malicious websites are further optimized based on comparison with those function calls that are mostly employed by benign sites. Following analysis process is performed on these calls:
-
(a)
Frequency Mining of Function Calls
The frequency distribution of (short-listed) function calls in the JavaScripts extracted from websites is computed. A numeric reference-id is provided to each function call and the distribution is compared with a machine-learned model. The model has been generated after due training using both benign and malicious samples. Experimentations have been performed using various learning algorithms on the derived dataset. Meta-Rotation forest [29] is the learning algorithm that provides effective true positive and negative values.
-
(b)
Sequence Mining of Function Calls
To determine the sequential behavior of the function calls, they are grouped into logical categories based on their functionality. Table 3 provides an insight into 13 such groups that have been identified. The grouping is important since if we want to trace the sequential function call behavior, we need to trace the functionality aspect irrespective of the type of call employed. For instance, string manipulation can be performed using numerous different calls. After the division of the calls under their logical heads, the sliding window sequence is generated with window-size = 5. This size has been selected after performing experimentations with window size of 2, 5, 10, 15, 20, 25, and 30. Trees-Random forest [28] is the learning algorithm used for classification.
-
(a)
4.6 Signature Scanning
The HTML and extracted JavaScript contents are scanned against malicious signatures which have included from following sources:
-
(a)
Self-Crafted Signatures
Currently 5 such signatures exist, which have been formulated from all instances of JavaScripts extracted from Drive-by-Download websites.
-
(b)
iScanner Signatures
iScanner [30] specifically contains the signatures to detect malicious strings in HTML DOM and JavaScripts.
-
(c)
Snort Signatures
Snort content-based JavaScript signatures [31] have been included in UAC.
-
(d)
Honeysift Signatures
Honeysift [16] is a low interaction Honeyclient which provides 19 malicious signatures for JavaScript.
4.7 Redirection Domains and DOM Structural Graph
UAC provides an additional output of all the redirections that were dynamically and automatically generated during URL visit. The domain information is extracted using DNS transactions. These provide an overview of the all sites involved in the infection cycle for any given malicious site. This information provides significant domain redirection chain to incident-handling agencies.
DOM Structural graphs can also be visualized in a tree structure form for every URL which gives details of the DOM elements. It provides information regarding the placement of DOM elements in any site. The graphs are generated in PNG format for every analyzed site.
4.8 Parallel Evaluation
UAC performs parallel evaluation with Google-Safebrowsing for every URL and the results are presented to the user on the same analysis console. The last date of Google validation for any site is also included. Google declares website as suspicious or benign and also provides additional information like domains acting as intermediaries for malware distribution, or the websites that are actively involved in transmitting infections. This facilitates benchmarking and comparison with UAC results.
4.9 Distributed Deployment
UAC is implemented as a Low interaction Honeyclient and has been integrated in Distributed Honeynet System (DHS). Currently, DHS nodes are operational at eight geographical locations across India. The distributed deployment is done through implementation of UAC as a virtual machine in DHS client node. The central analysis server performs the load balancing and load distribution to various nodes depending upon URL list.
The actual analysis is performed at the client and the results are mapped to a central analysis server on regular basis. This significantly reduces the transmission overhead and consumes less bandwidth and memory. Also, this system minimizes the operating cost of server.
5 Experimentations and Evaluations
5.1 Performance Measurement (Standalone Systems)
5.2 Performance Measurements (Distributed Systems)
See Table 6.
5.3 Evaluations with Respect to Other Low Interaction Honeyclient
UAC has been evaluated against other open-source Low interaction Honeyclients with respect to feature-set and analysis capabilities. Table 7 presents the comparison results and depicts the effectiveness of UAC in detecting large number of malicious URLs.
5.4 Experimental Evaluations
List of Potentially malicious sites were derived from various sources including Cert-In. These sites are analyzed by UAC and the results have been shared with incident response group. This also aids in validation of UAC results. Following statistics have been generated from these experimentations (Table 8).
5.5 Multi-threading Approach
A multi-threaded application permits a still faster execution of UAC. However, multi-threading exploits the parallelism inherent in the program itself. Table 9 provides an overview of the various stages in UAC that are candidates for multiple thread execution.
The performance improvement using multiple threads is directly visible from following performance measurements:
Latency (s) | Throughput (URLs/h) | |
---|---|---|
With threading | 12 | 300 |
Without threading | 20 | 180 |
6 Towards Signature Formulation
Anti-virus scanners detect attacks based on their signature database. With the ever growing diversification in the attack code, it becomes a useful and desirable activity to generate signatures for the unknown attacks. However, the main goal of our approach is to update the signature database of open-source community anti-virus i.e. Clamav.
All the JavaScripts that are declared malicious by UAC are further validated by submission to Virus-total portal to determine if popular anti-virus scanners also label them as malicious. The developed automated mechanism for signature generation filters out all the scripts which are labeled as malicious by popular antivirus engines but not by clamav. Subsequently, hexadecimal and hash-based signatures are generated for the filtered JavaScript. These are eventually populated in clamav to enhance its signature repository. This activity is a continual process to permit the regular enrichment of open-source signatures repository.
7 Conclusion and Future Work
UAC is a novel approach towards distributed and scalable analysis of URLs which leverages the significance of dynamic execution (through emulation) and static analysis. UAC inspects the webpage from various perspectives including suspicious DOM parsing and JavaScript analysis and attempts to cover maximum analysis domain. Also, other popular dynamic client side scripts like Jscripts are accommodated in our analysis easily because they are based on ECMA standards [32] and SpiderMonkey interprets ECMA scripts. We have even manually analyzed URLs declared as benign by UAC to identify the reasons of failures and found that in most of the sites, the infection is already removed by the time it is analyzed by UAC. However, certain other analysis processes like integration of file analyzers including SWF, PDF, etc. can be integrated for further inspecting the complete downloaded web-code. Also, in some websites, we happened to come across malware injected in the form of VB scripts, which is currently not included in our scope.
The distributed crawling is the area that we can pursue further making use of facilities like grid computing to perform large-scale analysis. Also, the whole application can be ported on a High performance computing infrastructure to optimize the speed and levels of performance for distributed computing.
References
Drive-by download—Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Drive-by_download
Egele, M., Wurzinger, P., Kruegel, C., Kirda, E.: Defending browsers against drive-by downloads: mitigating heap-spraying code injection attacks. In: Proceedings of DIMVA’09, 6th International Conference on Detection of Intrusions and Malware and Vulnerability Assessment, Milano, Italy, 9–10 July 2009. Springer LNCS
Secure Browsing, Malware Protection, Trustwave. https://www.trustwave.com/securebrowsing/
Google Safe Browsing. http://www.google.com/tools/firefox/safebrowsing/
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Proceeding of the 19th International Conference on World Wide Web, pp. 281–290. ACM, New York (2010)
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of WWW 2011. ACM, Hyderabad, India, 28 March–1 April 2011
Song, C., Zhuge, J., Han, X., Ye, Z.: Preventing drive-by download via inter-module communication monitoring. In: Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, ASIACCS’10, pp. 124–134. ACM, New York (2010)
Zhang, J., Seifert, C., Stokes, J.W., Lee, W.: ARROW: generating signatures to detect drive-by downloads. In: Proceedings of WWW 2011. ACM, Hyderabad, India, 28 March–1 April 2011. 978-1-4503-0632-4/11/03
Ratanaworabhan, P., Liyshits, B., Zorn, B.G.: Nozzle: a defense against heap-spraying code injection attacks. In: Proceedings of the 18th Conference on USENIX Security Symposium, SSYM’09, pp. 169–186. USENIX Association, Berkeley (2009)
Wei, T., Wang, T., Duan, L., Jing, L.: Secure dynamic code generation against spraying. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS’ 10, pp. 738–740. ACM, New York (2010)
Dewald, A., Holz, T., Freiling, F.C.: ADSandbox: sandboxing JavaScript to fight malicious websites. In: Proceedings of the 2010 ACM Symposium on Applied Computing, SAC’10, pp. 1859–1864. ACM, New York (2010)
BLADE—Block All Drive-by Download Exploits. http://www.blade-defender.org/
Lu, L., Yegneswaran, V., Porras, P., Lee, W.: BLADE: an attack agnostic approach for preventing drive-by malware infections. In: Proceedings of the 17th ACM Conference on Computer and Communication Security, CCS’10, pp. 440–450. ACM, New York (2010)
Seifert, C., Welch, I., Komisarczuk, P.: Honeyc—the low-interaction client Honeypot. In: Proceedings of the 2007 NZCSRCS, Waikato University, Hamilton, New Zealand (2007)
Nazario, J.: PhoneyC: a virtual client Honeypot. In: Proceedings of the 2nd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and more, LEET’09, p. 6. USENIX Association Berkeley, CA (2009)
Forest, D., Weisen, C., Leong, K.P., Siang, H.Y.: HoneySift: a fast approach for low interaction client based Honeypot. In: www.studyMode.com. 23 Jan 2011. http://www.studymode.com/essays/Honeysift-A-Low-Interaction-Client-Honeypot-558127.html
Ikinci, A., Holz, T., Freiling, F., Mannheim, G.: Monkey-Spider: detecting malicious websites with low-interaction Honeyclient. Sicherheit, Saarbruecken (2008)
Alosefer, Y., Rana, O.: Honeyware: a web-based low interaction client Honeypot. In: Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops, ICSTW’10, pp. 410–417. IEEE Computer Society, Washington, DC (2010)
Feinstein, B.: Caffeine Monkey: Automated Collection, Detection and Analysis of JavaScript. Dell Secure-Works Inc., BlackHat USA, Las Vegas (2007)
Eshete, B., Villafiorita, A., Weldemariam, K.: BINSPECT: Holistic Analysis and Detection of Malicious Web Pages. SecureComm 2012, pp. 149–166 (2012)
Curtsinger, C., Livshits, B., Zorn, B.G., Seifert, C.: ZOZZLE: fast and precise in-browser JavaScript malware detection. In: USENIX Security Symposium (Microsoft Research) (2011)
Choi, Y., Kim, T., Choi, S., Lee, C.: Automatic detection for JavaScript obfuscation attacks in web pages through string pattern analysis. In: Future Generation Information Technology, Lecture Notes in Computer Science, vol. 5899, p. 160. Springer, Berlin (2009). ISBN 978-3-642-10508-1
Xu, W., Zhang, F., Zhu, S.: JStill: mostly static detection of obfuscated malicious JavaScript code. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, CODASPY’13 (2013)
Li, Z., Zhang, K., Xie, Y., Yu, F, Wang, X.F.: Knowing your enemy: understanding and detecting malicious web advertising. In: ACM Conference on Computer and Communications Security 2012 (Microsoft Research), pp. 674–686 (2012)
Elinks—lynx-like alternative character mode WWW browser. http://manpages.ubuntu.com/manpages/lucid/man1/elinks.1.html
Spider Monkey, MDN. https://developer.mozilla.org/en/docs/SpiderMonkey
Chapter 6—Shannon entropy. http://www.ueltschi.org/teaching/chapShannon.pdf
Random Forest. http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/RamdomForest.html
Rotation Forest. http://weka.sourceforge.net/doc.packages/rotationForest/weka/classifiers/meta/RotationForest.html
iScanner. http://iscanner.isecurity.org
Snort. http://www.snort.org
ECMA Standards. http://www.ecma-international.org/publications/standards/Standard.htm
Acknowledgements
We are grateful to Dr. Bruhadeshwar Bezawada, Assistant Professor, IIIT, Hyderabad for his support, time-to-time guidance and periodic feedback on the analysis process. He has also suggested various improvements to address scalability.
We are also thankful to Mr. S. S. Sarma, Scientist ‘E’, Cert-In for providing useful inputs regarding the selection of significant parameters for analysis. Cert-In team has been regularly providing us the list of URLs and evaluating our results.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kaur, H., Madan, S., Sehgal, R.K. (2014). UAC: A Lightweight and Scalable Approach to Detect Malicious Web Pages. In: Silhavy, R., Senkerik, R., Oplatkova, Z., Silhavy, P., Prokopova, Z. (eds) Modern Trends and Techniques in Computer Science. Advances in Intelligent Systems and Computing, vol 285. Springer, Cham. https://doi.org/10.1007/978-3-319-06740-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-06740-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06739-1
Online ISBN: 978-3-319-06740-7
eBook Packages: EngineeringEngineering (R0)