Keywords

1 Introduction

With the increasing growth of the internet and the widespread connected devices, IoT devices are often becoming the target of network attacks because they can be unsecured and used as a Bot. Botnets are one of the major cyber-security threats facing web technologies today. These botnets constitute a platform for launching numerous cyberattacks. Examples of these attacks include Distributed Denial of Service (DDoS), malware distribution, phishing, and click fraud for ransomware and Fortnite attacks [1]. Hospitals, companies, etc., can also become victims of botnet attacks because they often use a command and control (C&C) architecture to coordinate simultaneous encryptions of files. Bots look for vulnerable, unpatched, and unprotected devices to compromise to the C&C architecture.

The detection of botnets is an important research topic because these botnets stay hidden until their Botmaster is aware to execute an attack or task. Spammers have increasingly exploited these bots to send phising and spam emails [2, 3]. To combat these attacks, our work proposed a method that constructs a comprehensive classification on the ontology of botnets which extracts knowledge from various textual sources to populate our knowledge graph on botnets and their attack type. We extract knowledge from details about the botnets via various web sources such as cert.org, and classify botnets based on relations and features. Ontologies introduce machine processing capability to knowledge structures [4] and are essential for semantic web engineering [5]. The ultimate goal of this effort is to develop an ontology of the Botnet domain, expressed using the Semantic Web Language (OWL, RDF, SPARQL, and Semantic Web rule Language (SWRL)) as explained by [6], that will enable the use of Pellet or Hermit reasoner to reason over the knowledge graph which combines a variety of collaborative agents to derive improved results and classification botnets based on features and attacks. The botnet ontology maintain detailed information repository about the behavior of botnet, classification, features, and attack type. It provides an effective way for researchers to analyze botnet samples, and further their understand infection procedures of botnets, as well as measure the potential damage extent of botnets.

The remaining parts of the paper will be organized as follows. In Sect. 2, we gave a background study about the semantic web, ontologies, botnets, classification, and attack patterns. In Sect. 3, we propose our Ontology and Methods. In Sect. 4, we performed experiments by running queries. Finally, Sect. 5 presents conclusions.

2 Background

2.1 Ontology

Ontologies are frameworks for representing a shares conceptualization of knowledge across a domain and are often defined as a shared conceptualization of a domain [5]. Ontologies help describe the concepts within a domain through representing classes and properties of taxonomies. The described concepts may include class properties and attributes, in addition to the relationships among these classes which can determine how they interact with each other. A class instance can be defined as an individual instantiated example of a specific class. They play a vital role in the Semantic Web vision where ontologies provide the semantic annotation of websites in a meaningful way for machine interpretation [7]. Ontologies can describe relationships, classes, and their high interconnectedness. This makes an ontology ideal for modeling high-quality, linked, and coherent data.

2.2 Semantic Web

Berners-Lee, et al. [8] explained that Semantic Web gives meaningful structure to the content of Web pages in machine-readable formats for agent-based computing to carry out sophisticated tasks. There are three foundational Semantic Web technologies: RDF, SPARQL, and OWL.

  • OWL is Semantic Web language family that represents complex relationships among things and groups of things [9]. They add semantics to the schema and rely heavily on the reasoner OWL gives you a much larger vocabulary to play with, making it easy to say anything you might want to say about your data model.

  • RDF is a method for describing metadata by using a standard data model. RDF is used to build knowledge graphs, an abasis machine-readable data repository containing many structured and unstructured data. RDF statement states things about its subject by linking it to an object.

  • SPARQL is the standard query language and the data access protocol for RDF databases. It can efficiently extract information hidden in non-uniform data and stored in various formats and sources by navigating relationships in RDF graph data through graph pattern matching.

  • SWRL is a proposed language that can be used to express rules and logic when combined with OWL in the Semantic Web. However, SWRL expressions require an SWRL-enabled reasoner like Pellet, Hermit, etc. A reasoner is a software capable of inferring logical consequences from axioms. It helps determine whether the ontology is consistent, identifies subsumption relationships between classes, and more. This rule-based reasoner increases the inference capability of ontology-based models, and it achieves significant contributions when semantic queries are done.

  • SQWRL stands for Semantic Query Web Rule Language and it is used for querying the ontology. SQWRL is based on SQL and facilitates client queries [10].

2.3 Botnet

A botnet is a set of infected computing devices under the control of an attacker (the bot herder or the Botmaster) (Fig. 1). There’s no minimum size for a group of infected computers to be called a botnet, and an individual computer in a botnet is generally called “bots” or “zombies”. The Botmaster can use the command and control channel to disseminate malicious commands to the bot army via the C&C server (as shown in Fig. 2). The C&C channel enables the Botmaster to remotely control the action of a large number of bots to conduct various illicit activities and are usually distributed via Web downloads and email attachments.

Fig. 1
figure 1

Sample botnet diagram

Fig. 2
figure 2

A sample botnet ontology model showing classes, subclass, and relationships

Botnets are often classified according to their communication protocols;

  • Internet Relay Chat (IRC) botnet—infected machines with malware that can be controlled remotely via an IRC channel. The IRC protocol mainly allows communication and data dissemination among users of large social networks. Examples of bots that use this communication protocol are DorkBot, Gamebot, RageBot, Phorpiex, etc.

  • HTTP botnet—a web-based botnet that allows periodical dissemination of commands through the HTTP protocol. The herder of an HTTP botnet masks the malicious activities as regular HTTP traffic. Examples are Zeus and Spyeye.

  • P2P (peer-to-peer)—a new generation of botnets in which different bots can share information and commands by coming to a direct contact with the C&C server. Relative to IRC and HTTP botnets, P2P botnets are harder to locate, monitor, and implement since they do not rely on one centralized server. Examples are Mirai, Mariposa, and GameoverZeus Botnets.

Examples of illegal activities that can exploit Botnets include DDoS, Identity Theft, and Traffic spamming.

  • DDOS—An attack to crash a target server in which numerous bots send connection requests to the server to overwhelm it and prevent the operation of other legitimate requests to the server.

  • Phishing: is a form of Social engineering where botnets are used to distribute malware via phishing emails.

  • Identity Theft—An attack of stealing a victim’s identity by using botnets, e.g., identity theft botnets are keyloggers that can record the user password and send it to the bot herders during the login user operation.

  • Traffic Spamming—an attack that allows actively injecting malicious code into the HTTP traffic or passively gathering user sensitive information.

3 Methods

We developed an ontology for the botnet classification based on RDF, OWL, SPARQL, and Python Framework. Using Protégé, we created an expressive botnet ontology by extracting important data about botnets from internet sources such as wikis, blogs, newspapers, magazines, social networking sites, and video-sharing sites. Examples of the extracted data include the following:

  • Botnet indicators and detection methods,

  • Botnet attributes and characteristics,

  • Software vulnerabilities and security loopholes, and

  • Attack tactics.

The ontology captures semantic information from threat reports into a shared repository structure that facilitates collecting, aggregating, and analyzing the captured data. However, a machine cannot infer this knowledge from the text alone. Our cognitive approach addresses this issue by integrating a standard reasoning technique that will detect inconsistencies during data sharing and infer new information from existing information. Modeling after the Pizza Ontology example, we created classes, individuals, and characteristics in the ontology.

3.1 Design of Botnet Class

As shown in Fig. 2, the botnet class defines the concepts about botnet categories, botnet features, and attack type. Botnets are often classified according to their communication protocols which are IRC, HTTP, and P2P. Each class represents a specific concept within the model. A class can have a subclass, e.g., Zeus is a subclass of HTTP Botnet. The attack class is the negative effect when a botnet affects a computer. It comprises several instances; DDOS, phishing, identity theft, malware distribution, traffic spamming, and cryptocurrency mining. The feature class provides valuable details about the features common to all types of botnets, as shown in Fig. 3.

Fig. 3
figure 3

Botnet class

Figure 4 shows the ontology of the Zeus Class, which is a subclass of the HTTP Botnet, and it inherits features from various classes as shown in Fig. 5, which gives a complete description of the Zeus Botnet showing the relationship between canperform and hasfeature.

Fig. 4
figure 4

Zeus class

Fig. 5
figure 5

Botnet object properties

3.2 Design of Object Property (Behavior Class)

Here, we define the main relationship representation properties between the concepts. The object properties represent the semantics of the sentences and connect the instances in the botnet classification. Property characteristics are defined by the domain and thereby enforce restrictions on classes and relationships. Figure 5 shows the class botnet with two properties hasfeature and canperform.

  • canperform: describes the types of attacks botnets can perform.

  • Domain: Botnet

  • Range: Attacks

  • hasfeatures: connects botnets with their features.

  • Domain: Botnet

  • Range: Feature

3.3 Ontology Description of Botnet Individual

In the ontology model, the attack a botnet performs is individuals/instances belonging to the class Attack, including DDOS, phishing, identity theft, malware distribution, traffic spamming, and cryptocurrency. Using the canperform object property, we can create a relationship between botnet classification and the attack pattern. Figure 6 shows the individual phishing and how it is used.

Fig. 6
figure 6

Botnet individuals

As shown in Fig. 2, we created the ontology to show the classes and subclass (indicated using the blue lines) and individuals (denoted using the purple lines) as the attack type and gave them object properties (displayed using the yellow dotted lines).

4 Experiment and Results

This section evaluates botnets by using some SPARQL queries and running them on the ontology to answer a number of questions;

Retrieving botnet name related to an attack type: Botnet is queried to extract the names of the different botnets that can perform a phishing attack using the object property canperform (Fig. 7).

Fig. 7
figure 7

All botnet that can perform a phishing attack

In Fig. 8, in lines 1–3, we defined the prefix at the top of the query so that we can abbreviate URIs [bonet: http://www.semanticweb.org/ontologies/botnets# (contains information on the botnet); RDFS: http://www.w3.org/2000/01/rdf-schema# (provides a mechanism for describing groups of related resources (RDF), and the relation between botnet); OWL: http://www.w3.org/2002/07/owl# (for creating more detailed descriptions of resources.)] to make the query more readable. Lines 5–9 use the SELECT statement to query patterns Botnet WHERE rdfs is a subclass of a botnet with an OWL Object property canperform and has value phishing. In plain language, it searches for botnets that can perform phishing.

Fig. 8
figure 8

List of botnets and their features

Retrieving information on botnet and their features: We extracted the names of the different botnets and their features using the Object property hasfeature.

Using SQWRL to list all the individuals within the ontology, as shown in Fig. 9, we use the Pellet or Hermit reasoner that supports SWRL to create rule S2, which queries the ontology OWL: Thing and selects (?i), the Individual or the Attack type, and displays them in ascending order.

Fig. 9
figure 9

SQWRL query

4.1 Application of Ontology to Python

As Shown in Fig. 10, we applied our ontology to Python and Lines 1–3 import the RDFLIB, a pure Python package that provides the main types of RDF and their interfaces. The Python package provides a plugin interface for parsers, stores, and serializers facilitated for other packages to implement and plug them into the Idlib package. Rdflib is the primary interface for working with RDF in Idlib Graph. Line 6 uses the parse command to read in the Botnet, OWL file. Lines 8–20 is the SPARQL query that extracts the list of botnets in which object property: canperform the attack type: Phishing. Lines 22–24 print out the result of the query which matches with that of Fig. 7.

Fig. 10
figure 10

Botnet ontology in python

5 Conclusion

In this paper, we design botnet’s concept classes, individuals, and object properties and propose the methods for constructing the knowledge graph of botnet, classification, features, and attack type. Our technique extracts knowledge from various textual sources to populate our knowledge graph on botnets and their attack type. The ultimate goal of this effort is to develop an ontology of the botnet domain, expressed using the Semantic Web Language (OWL, RDF, SPARQL, and SWRL) as explained by Tim Burners-Lee, that will enable the use of Pellet or Hermit reasoner to reason over the knowledge graph which combines a variety of collaborative agents to derive improved results and classification botnets based on features and attacks. The botnet ontology stores detailed behavior knowledge about botnet, classification, features, and attack type.