Abstract
Botnets have become a vital security problem on the Internet as such attacks lead to fraud, spam, identity theft, and information leakage. No intelligent classification knowledge graph of Botnets has been created for integration into AI applications. We address this by integrating concepts from cybersecurity into AI. Using an ontology model, we designed concept classes, individuals, and object properties of botnet to construct a knowledge graph of botnet containing their classification, features, and attack type. Our technique extracts cybersecurity knowledge from various textual sources to populate our knowledge graph on botnets and their attack type. To construct our knowledge base, we use Web Ontology Language (OWL 2 DL) for knowledge representation and Resource Description Framework (RDF) as a standard model for metadata representation. The system then reasons over the knowledge graph that combines a variety of collaborative agents to derive improved results. We describe a proof-of-concept framework for our approach as well as demonstrate its capabilities by testing it against different attack types and botnet identification features. Our knowledge base will help researchers analyze botnet samples and understand the infection procedures of botnets. It will also help in measuring the potential risk and possible damages of botnets.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
With the increasing growth of the internet and the widespread connected devices, IoT devices are often becoming the target of network attacks because they can be unsecured and used as a Bot. Botnets are one of the major cyber-security threats facing web technologies today. These botnets constitute a platform for launching numerous cyberattacks. Examples of these attacks include Distributed Denial of Service (DDoS), malware distribution, phishing, and click fraud for ransomware and Fortnite attacks [1]. Hospitals, companies, etc., can also become victims of botnet attacks because they often use a command and control (C&C) architecture to coordinate simultaneous encryptions of files. Bots look for vulnerable, unpatched, and unprotected devices to compromise to the C&C architecture.
The detection of botnets is an important research topic because these botnets stay hidden until their Botmaster is aware to execute an attack or task. Spammers have increasingly exploited these bots to send phising and spam emails [2, 3]. To combat these attacks, our work proposed a method that constructs a comprehensive classification on the ontology of botnets which extracts knowledge from various textual sources to populate our knowledge graph on botnets and their attack type. We extract knowledge from details about the botnets via various web sources such as cert.org, and classify botnets based on relations and features. Ontologies introduce machine processing capability to knowledge structures [4] and are essential for semantic web engineering [5]. The ultimate goal of this effort is to develop an ontology of the Botnet domain, expressed using the Semantic Web Language (OWL, RDF, SPARQL, and Semantic Web rule Language (SWRL)) as explained by [6], that will enable the use of Pellet or Hermit reasoner to reason over the knowledge graph which combines a variety of collaborative agents to derive improved results and classification botnets based on features and attacks. The botnet ontology maintain detailed information repository about the behavior of botnet, classification, features, and attack type. It provides an effective way for researchers to analyze botnet samples, and further their understand infection procedures of botnets, as well as measure the potential damage extent of botnets.
The remaining parts of the paper will be organized as follows. In Sect. 2, we gave a background study about the semantic web, ontologies, botnets, classification, and attack patterns. In Sect. 3, we propose our Ontology and Methods. In Sect. 4, we performed experiments by running queries. Finally, Sect. 5 presents conclusions.
2 Background
2.1 Ontology
Ontologies are frameworks for representing a shares conceptualization of knowledge across a domain and are often defined as a shared conceptualization of a domain [5]. Ontologies help describe the concepts within a domain through representing classes and properties of taxonomies. The described concepts may include class properties and attributes, in addition to the relationships among these classes which can determine how they interact with each other. A class instance can be defined as an individual instantiated example of a specific class. They play a vital role in the Semantic Web vision where ontologies provide the semantic annotation of websites in a meaningful way for machine interpretation [7]. Ontologies can describe relationships, classes, and their high interconnectedness. This makes an ontology ideal for modeling high-quality, linked, and coherent data.
2.2 Semantic Web
Berners-Lee, et al. [8] explained that Semantic Web gives meaningful structure to the content of Web pages in machine-readable formats for agent-based computing to carry out sophisticated tasks. There are three foundational Semantic Web technologies: RDF, SPARQL, and OWL.
-
OWL is Semantic Web language family that represents complex relationships among things and groups of things [9]. They add semantics to the schema and rely heavily on the reasoner OWL gives you a much larger vocabulary to play with, making it easy to say anything you might want to say about your data model.
-
RDF is a method for describing metadata by using a standard data model. RDF is used to build knowledge graphs, an abasis machine-readable data repository containing many structured and unstructured data. RDF statement states things about its subject by linking it to an object.
-
SPARQL is the standard query language and the data access protocol for RDF databases. It can efficiently extract information hidden in non-uniform data and stored in various formats and sources by navigating relationships in RDF graph data through graph pattern matching.
-
SWRL is a proposed language that can be used to express rules and logic when combined with OWL in the Semantic Web. However, SWRL expressions require an SWRL-enabled reasoner like Pellet, Hermit, etc. A reasoner is a software capable of inferring logical consequences from axioms. It helps determine whether the ontology is consistent, identifies subsumption relationships between classes, and more. This rule-based reasoner increases the inference capability of ontology-based models, and it achieves significant contributions when semantic queries are done.
-
SQWRL stands for Semantic Query Web Rule Language and it is used for querying the ontology. SQWRL is based on SQL and facilitates client queries [10].
2.3 Botnet
A botnet is a set of infected computing devices under the control of an attacker (the bot herder or the Botmaster) (Fig. 1). There’s no minimum size for a group of infected computers to be called a botnet, and an individual computer in a botnet is generally called “bots” or “zombies”. The Botmaster can use the command and control channel to disseminate malicious commands to the bot army via the C&C server (as shown in Fig. 2). The C&C channel enables the Botmaster to remotely control the action of a large number of bots to conduct various illicit activities and are usually distributed via Web downloads and email attachments.
Botnets are often classified according to their communication protocols;
-
Internet Relay Chat (IRC) botnet—infected machines with malware that can be controlled remotely via an IRC channel. The IRC protocol mainly allows communication and data dissemination among users of large social networks. Examples of bots that use this communication protocol are DorkBot, Gamebot, RageBot, Phorpiex, etc.
-
HTTP botnet—a web-based botnet that allows periodical dissemination of commands through the HTTP protocol. The herder of an HTTP botnet masks the malicious activities as regular HTTP traffic. Examples are Zeus and Spyeye.
-
P2P (peer-to-peer)—a new generation of botnets in which different bots can share information and commands by coming to a direct contact with the C&C server. Relative to IRC and HTTP botnets, P2P botnets are harder to locate, monitor, and implement since they do not rely on one centralized server. Examples are Mirai, Mariposa, and GameoverZeus Botnets.
Examples of illegal activities that can exploit Botnets include DDoS, Identity Theft, and Traffic spamming.
-
DDOS—An attack to crash a target server in which numerous bots send connection requests to the server to overwhelm it and prevent the operation of other legitimate requests to the server.
-
Phishing: is a form of Social engineering where botnets are used to distribute malware via phishing emails.
-
Identity Theft—An attack of stealing a victim’s identity by using botnets, e.g., identity theft botnets are keyloggers that can record the user password and send it to the bot herders during the login user operation.
-
Traffic Spamming—an attack that allows actively injecting malicious code into the HTTP traffic or passively gathering user sensitive information.
3 Methods
We developed an ontology for the botnet classification based on RDF, OWL, SPARQL, and Python Framework. Using Protégé, we created an expressive botnet ontology by extracting important data about botnets from internet sources such as wikis, blogs, newspapers, magazines, social networking sites, and video-sharing sites. Examples of the extracted data include the following:
-
Botnet indicators and detection methods,
-
Botnet attributes and characteristics,
-
Software vulnerabilities and security loopholes, and
-
Attack tactics.
The ontology captures semantic information from threat reports into a shared repository structure that facilitates collecting, aggregating, and analyzing the captured data. However, a machine cannot infer this knowledge from the text alone. Our cognitive approach addresses this issue by integrating a standard reasoning technique that will detect inconsistencies during data sharing and infer new information from existing information. Modeling after the Pizza Ontology example, we created classes, individuals, and characteristics in the ontology.
3.1 Design of Botnet Class
As shown in Fig. 2, the botnet class defines the concepts about botnet categories, botnet features, and attack type. Botnets are often classified according to their communication protocols which are IRC, HTTP, and P2P. Each class represents a specific concept within the model. A class can have a subclass, e.g., Zeus is a subclass of HTTP Botnet. The attack class is the negative effect when a botnet affects a computer. It comprises several instances; DDOS, phishing, identity theft, malware distribution, traffic spamming, and cryptocurrency mining. The feature class provides valuable details about the features common to all types of botnets, as shown in Fig. 3.
Figure 4 shows the ontology of the Zeus Class, which is a subclass of the HTTP Botnet, and it inherits features from various classes as shown in Fig. 5, which gives a complete description of the Zeus Botnet showing the relationship between canperform and hasfeature.
3.2 Design of Object Property (Behavior Class)
Here, we define the main relationship representation properties between the concepts. The object properties represent the semantics of the sentences and connect the instances in the botnet classification. Property characteristics are defined by the domain and thereby enforce restrictions on classes and relationships. Figure 5 shows the class botnet with two properties hasfeature and canperform.
-
canperform: describes the types of attacks botnets can perform.
-
Domain: Botnet
-
Range: Attacks
-
hasfeatures: connects botnets with their features.
-
Domain: Botnet
-
Range: Feature
3.3 Ontology Description of Botnet Individual
In the ontology model, the attack a botnet performs is individuals/instances belonging to the class Attack, including DDOS, phishing, identity theft, malware distribution, traffic spamming, and cryptocurrency. Using the canperform object property, we can create a relationship between botnet classification and the attack pattern. Figure 6 shows the individual phishing and how it is used.
As shown in Fig. 2, we created the ontology to show the classes and subclass (indicated using the blue lines) and individuals (denoted using the purple lines) as the attack type and gave them object properties (displayed using the yellow dotted lines).
4 Experiment and Results
This section evaluates botnets by using some SPARQL queries and running them on the ontology to answer a number of questions;
Retrieving botnet name related to an attack type: Botnet is queried to extract the names of the different botnets that can perform a phishing attack using the object property canperform (Fig. 7).
In Fig. 8, in lines 1–3, we defined the prefix at the top of the query so that we can abbreviate URIs [bonet: http://www.semanticweb.org/ontologies/botnets# (contains information on the botnet); RDFS: http://www.w3.org/2000/01/rdf-schema# (provides a mechanism for describing groups of related resources (RDF), and the relation between botnet); OWL: http://www.w3.org/2002/07/owl# (for creating more detailed descriptions of resources.)] to make the query more readable. Lines 5–9 use the SELECT statement to query patterns Botnet WHERE rdfs is a subclass of a botnet with an OWL Object property canperform and has value phishing. In plain language, it searches for botnets that can perform phishing.
Retrieving information on botnet and their features: We extracted the names of the different botnets and their features using the Object property hasfeature.
Using SQWRL to list all the individuals within the ontology, as shown in Fig. 9, we use the Pellet or Hermit reasoner that supports SWRL to create rule S2, which queries the ontology OWL: Thing and selects (?i), the Individual or the Attack type, and displays them in ascending order.
4.1 Application of Ontology to Python
As Shown in Fig. 10, we applied our ontology to Python and Lines 1–3 import the RDFLIB, a pure Python package that provides the main types of RDF and their interfaces. The Python package provides a plugin interface for parsers, stores, and serializers facilitated for other packages to implement and plug them into the Idlib package. Rdflib is the primary interface for working with RDF in Idlib Graph. Line 6 uses the parse command to read in the Botnet, OWL file. Lines 8–20 is the SPARQL query that extracts the list of botnets in which object property: canperform the attack type: Phishing. Lines 22–24 print out the result of the query which matches with that of Fig. 7.
5 Conclusion
In this paper, we design botnet’s concept classes, individuals, and object properties and propose the methods for constructing the knowledge graph of botnet, classification, features, and attack type. Our technique extracts knowledge from various textual sources to populate our knowledge graph on botnets and their attack type. The ultimate goal of this effort is to develop an ontology of the botnet domain, expressed using the Semantic Web Language (OWL, RDF, SPARQL, and SWRL) as explained by Tim Burners-Lee, that will enable the use of Pellet or Hermit reasoner to reason over the knowledge graph which combines a variety of collaborative agents to derive improved results and classification botnets based on features and attacks. The botnet ontology stores detailed behavior knowledge about botnet, classification, features, and attack type.
References
Chowdhury S et al (2017) Botnet detection using graph-based feature clustering. J Big Data 4(1):14
Levy E (2003) The making of a spam zombie army. Dissecting the Sobig worms. IEEE Secur Priv 1(4):58–59
Alparslan E, Karahoca A, Karahoca D (2012) BotNet detection: enhancing analysis by using data mining techniques. In: Advances in data mining knowledge discovery and applications, p 349
Jain S, Meyer V (2018) Evaluation and refinement of emergency situation ontology. Int J Inf Educ Technol 8(10):713–719
Patel A, Jain S, Shandilya SK (2018) Data of semantic web as unit of knowledge. J Web Eng
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43
Horrocks I (2008) Ontologies and the semantic web. Commun ACM 51(12):58–67
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web—a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities (in English). In: Scientific American, p 34
McGuinness DL, Van Harmelen F (2004) OWL web ontology language overview. In: W3C recommendation, vol 10, no 10, p 2004
Roda F, Musulin E (2014) An ontology-based framework to support multivariate qualitative data analysis. In: Computer aided chemical engineering, vol 33. Elsevier, pp 1891–1896
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Adekanmbi, O., Wimmer, H., Shalan, A. (2023). Semantic Web Ontology for Botnet Classification. In: Jain, S., Groppe, S., Bhargava, B.K. (eds) Semantic Intelligence. Lecture Notes in Electrical Engineering, vol 964. Springer, Singapore. https://doi.org/10.1007/978-981-19-7126-6_4
Download citation
DOI: https://doi.org/10.1007/978-981-19-7126-6_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7125-9
Online ISBN: 978-981-19-7126-6
eBook Packages: Computer ScienceComputer Science (R0)