1 Introduction

The last two decades have been marked by an immense growth in the use of the Internet and its capabilities. This immense growth has brought a wealth of valuable information at the fingertips of its user. But on the darker side, it has also led to a number of gateways for the hackers and attackers to get into a network or capture confidential information of the users. These attacks not only lead to monetary losses, but also are a cause of concern for the security and integrity of nations as the use of Information Technology has also expanded to the defense and administrative services of a nation. The attacks on high profile companies like Foxconn, Facebook, PayPal and the Sun grid [14] are an indication of the changing trends followed by the attacker and have raised the issue of a need for an intrusion detection scheme that can detect new attacks in contrast to the signature based techniques commonly used.

Case like Yahoo security breach which exposed 450,000 usernames and passwords from a site shows the loophole in existing security systems in high-profile companies [5]. It still indicates that a considerable amount of work needs to be done in the development of a security model. The model should be efficient enough in capturing the trends followed by the attackers and predict attacks based on these trends. Such a prediction based on the trends used by attackers is only possible if we survey the activities performed by the attackers. Thereby, necessary preventive measures can be designed; tools can be developed to stop the activities of attackers. Honeypots are efficient in capturing these activities which could be used for research work on security analysis. Honeypots also provide a good way of filtering information related to attacks as they process no production data and capture activities performed by the attacker along with the traffic directed towards or away from it [6]. Thus the tracked activities can be of great value for network security personnel in developing methods for detecting zero day attacks.

The paper presents an approach for roaming the honeypot in a network, in order to deceive the attacker. The purpose of roaming or shifting of honeypots is to achieve maximum valuable information from it. As it is known that honeypots are deployed for captivating the strategies and activities of the attacker. If an attacker detects the honeypot on a system, then the attacker will leave that system and may broadcast this information to other attackers also. This will defeat the purpose of a honeypot. So to prevent this event to happen, honeypots are not stabilized on a particular system, rather being roamed in a network. The concept goes with placing the honeypot on an IP address that is most probable to be attacked by an attacker based on the current state of the network. Markov chain based analysis is used to detect the probability of the most vulnerable system to decide on which machine the honeypot is to be shifted and then concepts of IP shuffling and service on/off are used to implement the roaming of the honeypots in the network. Snort is used to capture data about the number of attacks with the severity of attacks on each of the systems of the network and based on the number of attack the severity threat score is calculated as per Holsopple et al. [7] and input is provided for Markov chain analysis. This roaming mechanism enhances the capability of the honeypot in capturing high value data for analysis and detection of zero day attacks. The honeynet has been implemented using Sebek client and server [6] for high interaction and honeyd [8] for low interaction implementation. Moreover the reactive roaming approach based on Markov chain presented in the paper assures that the honeypot is running on one of the most probable nodes for network attack at an instant [9]. The paper is organized as follows; related work already done in this area is given in Sect. 2, followed by a brief description about honeypots and the roaming honeypots in Sect. 3. Experimental setup details are mentioned in Sect. 4, Markov chain based honeypot roaming scheme implementation is provided in Sect. 5. The results and discussion are given in Sect. 6. Finally, Sect. 7 presents the conclusion and future work.

2 Related Work

In reference [10] impact of various attacks on systems and networks has been given based on four factors viz. CPU usage, memory usage, disk usage and network bandwidth utilization, a brief description of 146 attacks have also been mentioned.

Introduction on virtual honeypots and their utility in tracking the activities of the hackers is provided in [8]. Abbasi et al., have given an overview of a Generation III virtual honeynet which has been implemented using the Sebek client and server in [11]. A detailed overview of Sebek client and server is presented in [12] and this paper provides a description of the capabilities of the Sebek tool, its installation and configuration process.

Zargar et al. [13] have identified the impact of certain TCP/IP headers in identifying various attacks. They have identified the accuracy with which they could identify the attacks with the help of certain parameters and principal component analysis for data reduction and selection. In [14] the authors have given an algorithm that detects attacks based on patterns produced on parallel coordinates. Parallel coordinates have been formed for source address, destination address, destination port and packet length and then patterns for well known attacks have been identified. The authors propose that these patterns can be used for identifying zero day attacks if the traffic pattern produced by a zero day attack is similar to the pattern generated by one of the known attacks. Kim et al. [15] have proposed an algorithm that identifies attacks based on the flow header information and traffic patterns. The algorithm involves setting of various threshold values that vary in different networks.

In [16], a concept of roaming honeypots is put forward. A model is proposed which works by determining IP flow legitimacy by identifying early features of DoS attacks. This helps in roaming of honeypots and making them undetectable. In [17], Khattab et al., have proposed the concept of roaming honeypots to lessen the severity of the effects of DoS attacks. In these attacks the attacker systems misuses the victim’s server causing great damage. The roaming honeypots scheme detects and filters attack traffic from outside as well as inside a firewall. A proactive roaming scheme is proposed in [18], which make use of backward hash chains. The use of backward hash chains has an overall impact on the overhead of the system. The paper presents a much simpler approach to implement reactive roaming honeypots based on Markov chain analysis, and reduces the overhead as in the reactive roaming approach the honeypot is roamed/shifted to a machine where the severity of attack is more.

3 Brief Introduction of Honeypots

Honeypot is a tool that is used to track attacker’s activity. It lures the attacker to attack the system. Honeypots have no production value in fact their value lies in being probed, attacked or compromised. The main goal in deploying a honeypot is to get it attacked. It may act as a production honeypot within an organizations environment to help mitigate risk or as research honeypot which adds value to research in computer security by providing a platform to study the threat. An important point about honeypots is that it is not a solution to network attack but is a tool to analyze network traffic so that data can be captured about zero day attacks [19].

Low-interaction honeypots emulate services, network stacks, or other aspects of a real machine. They allow an attacker a limited interaction with the target system and allow us to learn mainly quantitative information about attacks. Low-interaction honeypots can primarily be used to gather statistical data and to collect high-level information about attack patterns. An attacker is not able to fully compromise the system since he/she interacts just with a simulation. Low-interaction honeypots construct a controlled environment and thus the risk involved are limited [8]. In our approach honeyd is used for the implementation of low interaction honeypots.

High-interaction honeypots offer a complete system to the attacker. The system deployed as a high interaction honeypot does not emulate any services, base operating system or any functionality. This approach gives an opportunity to learn about the tools, tactics and motives of the attacker. Although these honeypots give complete details about the attacker activity, but implementing them requires high cost and also a large amount of time is required [8]. In our approach we have implemented a virtual high interaction honeypot using Sebek, as it gives an opportunity to do an in depth analysis of network traffic and also the cost involved is relatively low. In the experiments virtual honeynet has been used and is described next.

3.1 Virtual HoneyNet

A Honeynet is a network of honeypots placed behind a reverse firewall. The reverse firewall is used to capture all inbound and outbound traffic from the honeypot network. The reverse firewall also limits the amount of inbound and outbound traffic flowing through the honeypots so that the systems may not be able to infect other systems on the network [20].

A virtual honeynet is a honeynet implemented using virtualization software. They are known as virtual honeynets because different operating systems used for implementing the honeypots on the host system give the appearance of independent computers [21]. Virtual honeynets provide advantages of reduced cost and easier management. Also, because of virtualization the attacker deals with the virtual operating system only and host operating system remains safe from the attacker. It may happen that the attacker is advanced enough to cause harm to the host operating system also. Therefore, virtualization provides just a level of security while implementing honeypots. Implementing honeypots on a single computer may provide complete control of the virtualization software to the attacker and may lead to compromise the entire honeynet.

3.2 Roaming Honeypots

The purpose of creating honeypots is to trap activities of the attackers. But they are unable to fulfill their purpose efficiently because they are deployed on fixed systems which are detectable. With fixed systems, honeypots will not be able to collect information about hackers attacking active computers. After some time any attacker may detect honeypot on the system and broadcast this information to attacker community. After knowing this no attacker will try to attack on that honeypot and the honeypot becomes useless. So, the concept of roaming honeypots is being proposed. Because of this mechanism, honeypots keep on roaming on different locations continuously within a network, thereby becoming unpredictable, and remains undetectable by the attacker.

Roaming of honeypots refers to transferring the data capture daemon from one node of the network to another based on various parameters such as network traffic, number of attacks, severity of attack etc. This approach not only helps us in capturing the activities of attackers by using the capabilities of the honeypot but also helps in increasing the probability of capturing data on a system that has the highest probability of being probed at a particular instance of time.

4 Experimental Setup

In order to perform the experiments, we have implemented a honeynet which consists of two virtual machines, one running Ubuntu server 7.10 [22] and the other running Windows XP professional [23]. Sebek client [24] is configured on both the virtual machines which are setup using VM-ware workstation 6.5.1 [25]. Another virtual machine is setup using Roo 1.4-CDROM which is based on CentOS version 5 and is used to configure the honeywall [26]. The honeywall acts as a gateway to the honeynet and can be configured to allow inbound and outbound traffic. The honeywall also provides a management interface named Walleye. The management interface is a Graphical User Interface (GUI) that facilitates the management and configuration of the honeywall.

A detailed overview of the tools and the hardware used in the experiment is given in Table 1. The architectural details of the high interaction honeypot scheme implemented is shown in Fig. 1.

Fig. 1
figure 1

Implemented architecture

Table 1 Specifications of hardware used

For experimentation VMware virtualization software has been used which emulates a complete x86-based computer system and allows the user to run more than one operating system over a host operating system. The operating system that runs inside the host machine is usually known as the guest operating system [10]. VMware acts like a complete emulator and provides a virtualized set of hardware to the guest operating system. It is also possible to configure the hardware according to the needs of the guest operating system. Three types of network adapters can be configured on VMware.

  • Bridged: The guest machine is on the same network as the host machine.

  • Host-Only: It provides a network connection between the virtual machine and the host computer. It is helpful in creating an isolated virtual network.

  • NAT: NAT is used to set up a virtual private network between the guest operating systems which have access to the internet [25].

4.1 Sebek Server and Sebek Client

A sebek server has been used to capture data from the honeypots (i.e. systems that has sebek client installed on them) so that an in depth analysis of the data can be performed afterwards [6]. In our experimental setup we have used the Roo Honeywall CDROM which has a pre-installed Sebek server on it. The Roo1.4 honeywall is based on Cent OS version 5 instead of Fedora (used in the previous versions), as CentOS has a longer support period. After installing Roo1.4 on the virtual machine we configured it according to our network. The Roo Honeywall provides various pre-installed tools for data analysis which are listed below.

  1. 1.

    Snort [28]. Intrusion Detection System (IDS).

  2. 2.

    Snort inline [28]. Intrusion Prevention System (IPS).

  3. 3.

    Argus, Pof. Passive OS fingerprinting tool.

  4. 4.

    Tcpdump [29]. Viewing of packet headers.

  5. 5.

    Hflow2 [30]. A data coalescing tool for Honeynet data analysis.

  6. 6.

    Walleye. Web based interface for Honeywall configuration, administration and data analysis.

  7. 7.

    Sebek [31]. Sebek is a data capture tool designed to stealthily capture attacker’s activities

Configuration of Honeywall requires three network adaptors. In our setup we have used VMware to configure the three network adaptors. The description on usages of all the three network adaptors is given below:

  1. 1.

    Eth0: The eth0 Ethernet interface is used to connect the honeywall to the external network or the internet. This interface has a hidden IP address so that it cannot be identified on the external network. eth0 uses bridged networking.

  2. 2.

    Eth1: The eth1 Ethernet interface is used to connect the honeywall running sebek server to the honeynet containing honeypots running sebek client. This interface also provides external connectivity to the honeypots through the honeywall. This interface does not have an IP address. The eth1 interface is a host-only interface.

  3. 3.

    Eth2: The eth2 Ethernet interface is used as the management interface. This interface is helpful in remote management of the honeywall. The IP address of the eth2 interface is used to access the Walleye interface using https protocol. The eth2 interface uses bridged networking.

Sebek client is a kernel module that needs to be installed on the systems those are to be configured as honeypots. All inbound and outbound traffic from the sebek client passes through the honeywall or the Sebek server. The main purpose of the sebek client is to capture all data about the activities performed on the honeypot and then send them to the sebek server. In our experiments we have deployed two honeypots. One of the honeypot is implemented on Ubuntu server 7.10 and the other on Windows XP professional. We downloaded the file sebek_disable_raw_socket_replacement-lin26-3.2.0b-bin.tar.gz for Ubuntu and Sebek-Win32-3.0.5.zip file for installation on Windows XP from projects.honeynet.org site. The virtual machines used to implement the sebek clients are configured to use host-only networking in order to form a virtual network of honeypots that can be accessed only through the honeywall.

4.2 Walleye

Walleye is a management interface for administration of honeypots as well as for the analysis of data collected on activities performed on the honeypot. Walleye runs through the eth2 interface of the Roo-honeywall. It provides a GUI to perform all management and configuration activities on the honeywall. The Walleye provides options to view flows, gives information about packet header and also provides an interface for making changes to the configuration files of the honeywall. The Walleye GUI can be accessed through a browser with the uniform resource locator (URL) https://ip-address/, the IP address used is the IP address of eth2 interface of the honeywall which is the remote management interface for walleye.

5 Proposed Honeypot Roaming Schemes

The roaming scheme works by activating the sebek daemon in case of high interaction honeypot or honeyd daemon for the low interaction honeypot. The basic approach can be explained in terms of the threat level which is predicted with the help of markov chain analysis in the proposed setup. As markov chain analysis helps in predicting a future state, so, it has been chosen for predicting the threat level on the systems in a network. The approach makes use of a threat variable ’s’ which changes from 0 to 1 on a system when the threat level on the particular system is greater than the threat level on all other systems and correspondingly the value of ’s’ is changed back to 0 on all other devices, after the honeypot is shifted. The block diagram of the proposed scheme is shown in Fig. 2 and the details of the use of markov chain analysis method for prediction of threat level are discussed next.

Fig. 2
figure 2

Proposed approach of a reactive roaming honeypots

The threat level is predicted, based on the priority of probes given by Snort for a particular IP address of the network. For this purpose a machine running Snort is deployed on the network to be monitored and corresponding switches take place based on the most probable system to be attacked. Snort classifies the alerts on the scale of 1–3 with 1 being the most severe [32], using snort priorities and number of alert, we calculate the threat score and then Markov chain approach is used to implement roaming based on these threat scores. The threat score is calculated as follows [7]:

$$\begin{aligned} \textit{Threat score} = \sum _{n=1}^{3} (\textit{alertcount}_n) \times (2^{3-n}-1) \end{aligned}$$
(1)

where n is priority of attack and alertcount\(_n\) is the total number of attacks of priority n. The Markov chain has to be applied to n+1 states for n IP addresses under consideration where n states correspond to each of the n IP addresses being under attack and \((n+1)^{th}\) state corresponds to no IP address under attack. The most attacked IP address is found by observing the threat score on a particular IP address for 50 time intervals. Then the transition probabilities are calculated using the following two formulae:

$$\begin{aligned}&\displaystyle P_{i j}=Prob({X_{n+1}=j}|{X_{n}=i})\end{aligned}$$
(2)
$$\begin{aligned}&\displaystyle Prob({X_{n+1}=j}|{X_{n}=i})=\frac{\textit{No.of event}((X_{n+1}=i)\bigcap (X_{n}=j))}{\textit{Total event}(X_{n}=i)} \end{aligned}$$
(3)

where \(P_{ij}\) is called transition probability which represents the probability at time t, the system being in state i, passes to state j at time t+1. The events \(X_n = i \) and \(X_{n+l} = j \) respectively expresses that the system is in state i at the moment n and the system is in the state j at the moment n+1. Then the event \((X_{n} = i)\bigcap (X_{n+l} = j)\) expresses that the system is in state j after having been in state i.

The transition probability is then used to find the most probable IP address for attack along with the initial probability matrix for attacks on each IP address. The honeypot is then shifted on the most probable IP address for attack so that minimum damage could be caused to the production systems and maximum data related to attacker’s activities could be collected on the honeypot.

5.1 Data Collection

In our experiments, we have setup a subnet in the institute ABV-Indian Institute of Information Technology and Management, Gwalior and we observed number of attacks for four IP addresses (192.168.39.81, 192.168.39.82, 192.168.39.83, 192.168.39.84), the results are presented over 50 time periods of time duration ’t’ and duration of each time period are considered as 1 min (the time period can be increased or decreased). The threat score observed on particular IP addresses is shown in Table 2.

Table 2 Results of the number of attacks per IP addresses collected from Snort IDS

From Table 2, we find the most attacked IP address for a time period with an assumption that in starting periods (say P1, P2, P3, and P4) no IP address should have same threat score. If during any period all IP addresses have zero threat score, i.e., no IP address is under attack, the assumption is made that the most-attacked IP address is noted as 0. If two (or more) IP addresses have same threat score in any period, we assume that the most-attacked IP address is the one which have been attacked the least during the former attack periods. Table 2 shows the results of the threat score for each of the IP addresses, collected from Snort Intrusion Detection System(IDS). From Table 2, the most-attacked IP address is found for each time period and is shown in Table 3.

Table 3 List of the most-attacked IP address

5.2 Markov Chain Analysis Modelling

In our experiments, there are five states to model the Markov chain. There are four states corresponding to the four IP addresses which are our pre-defined IP addresses and the fifth state is when no IP address is under attack.

From Eq. (2), Eq. (3), and data of Tables 2 and 3, we can calculate transition probability matrix at any time instant and is given as follows:

$$\begin{aligned} P = \left( \begin{array}{ccccc} 0.231 &{}0.231 &{}0.207 &{}0.077 &{}0.154 \\ 0.375 &{}0.125 &{}0.125 &{}0.125 &{}0.250 \\ 0.333 &{}0.083 &{}0.167 &{}0.250 &{}0.167 \\ 0.428 &{}0.143 &{}0.286 &{}0.000 &{}0.143 \\ 0.000 &{}0.222 &{}0.444 &{}0.222 &{}0.111 \end{array} \right) \end{aligned}$$
(4)

Initially in the network there is no IP address (system/node) is under attack and we assume that we have no IP address under attack.

So, we get an initial attack probability as

$$\begin{aligned} \begin{array}{ccccc} A(0)=[0&0&0&0&1] \end{array} \end{aligned}$$

According to the study that the probabilistic state of the system after time duration ’t’, we can predict:

$$\begin{aligned} A(t)=A(0)P \end{aligned}$$

Thus, we get

$$\begin{aligned} \begin{array}{ccccc} A(t) =[0.000&0.222&0.444&0.222&0.111] \end{array} \end{aligned}$$

So, it can be predicted that IP address 192.168.39.83 is most-attacked. Figure 3 shows the probability of attacks on the four IP addresses in percentage at a time instant, and helps in roaming the honeypot to the IP address which is likely to be attacked the most.

Fig. 3
figure 3

Details of attack on IP addresses

5.3 Roaming Schemes

The roaming schemes have been implemented using two techniques. (i) IP Shuffling (ii) Service ON/OFF. The IP shuffling and Service ON/OFF techniques are used for the implementation of roaming based on the probabilities calculated by the markov chain analysis method. The details of these techniques are described below.

5.3.1 Honeypot Roaming Using IP Shuffling

In order to implement roaming of honeypot first the most probable IP address for the attacks was identified using markov chain analysis, then it was freed by modifying the NAT table at the router. The freed IP was then assigned to the honeypot system so that all the data pertinent to the attack is collected in the honeypot system. The main advantage of this approach is that all the data could be collected at a centralized location. Although there may be a small amount of delay in assigning the IP address to the honeypot system.

5.3.2 Honeypot Roaming Using Service ON/OFF

This scheme activates the data capture daemon of the honeypot on a machine that is most probable for attack and turns off the daemons at all other nodes of the network. In the case of low interaction honeypot this technique is performed by manipulating the honeyd daemon, whereas, in the case of high interaction honeypot the service to be activated or deactivated is the sebekd daemon. The advantage of this daemon is that the switching takes place almost instantaneously through a script that runs on the system itself, whereas the data collected in this case is distributed on several systems.

6 Results and Discussion

The main advantage of this approach is that it strengthens the ability of the honeypot to capture data of high value. Compared to the static honeypot setup this approach is less likely to be fooled by the attacker, because even if the attacker detects the honeypot and stops to attack, the honeypot has moved to a system which is most likely to be attacked with the predictive roaming scheme and still the attacker activities be followed and captured. Also the markov chain analysis model used to predict the most valuable system for collection of data is a well established and vastly accepted model and thus the roaming method predicts the next possible attack IP address location.

The approach is based on Markov chain analysis for prediction of probable IP address for attacks. The predicted IP address to which the honeypot is roamed in each of the time periods is shown in Table 4 along with the attack probabilities for each of the four IP addresses considered. The probable IP address for attack predicted by markov chain analysis is then used as the IP address of the honeypot with the help of Service ON/OFF or IP shuffling. We have used a time interval of 10 min to roam the honeypots. This setup has been deployed for a period of more than 2 months and the data captured is approximately 5 GB, presenting all the results is out of the scope of the paper, thus partial results have been presented in the paper, to illustrate the proposed roaming scheme.

Table 4 Predictions made by Markov chain analysis for roaming

In order to compare the proposed approach with a static honeypot setup we deployed a static honeypot and a roaming honeypot using IP shuffling as well as Service ON/OFF approach for the same instant of time on the same network and then compared the number of flows to/or from the honeypot in both the cases. The data pertaining to the number of network flows was collected using the Walleye interface in the case of high interaction honeypot and using the honeyd log files analysis using a python script in case of low interaction honeypot. Although the number of flows is a naive method and does not guarantee that each flow is a probable attack. In order to compensate for this disadvantage we kept the network idle for the test period which is 5 h so that most of the flows can be thought of as malicious. The number of flows captured by the honeypot in each of the three approaches are shown in the Table 5.

Table 5 Comparison of static and roaming honeypot schemes

It can be inferred from Table 5 that the number of flows in the initial hour were moderate in the case of static honeypot which then increased considerably in the second hour. Most probably this was the time when the static honeypot was seen as a production system by the attacker. Then a steep fall in the number of flows in the third hour and then the considerable decrease in the flows can be attributed to the discovery of honeypot by the attacker and the attacker moving on to some other system. As many attackers usually communicate such information amongst them the value of the static honeypot decreases considerably once it has been discovered by an attacker. Whereas in the case of roaming honeypots the number of flows related to the honeypot at any instant of time kept increasing as the honeypot follows the attacker and as the attacker detects a honeypot on one of the IP address the honeypot moves on to some other IP address as predicted by markov chain model thus following the attacker.

7 Conclusion and Future Work

The approach proposed in this paper will help in understanding the attacker’s activities in more detail as the location of the honeypot remains undetectable from the attacker and it captures the maximum details of attacker’s activities. The Markov chain analysis is one of the most accepted approach used as a predictive model and is used to denote a sequence of events based on the transition probability matrix at a given time. This makes it possible to know the most probable IP address to be attacked during a given period based on the threat scores. It helps in predicting the next location where an attacker may move and thus can be of great use in following the attacker’s motives. Thus the contribution of this paper is in increasing the efficiency of the honeypot system in terms of capturing the attacker’s activities and can be of great value in the detection of zero day attacks. The work done in this paper proposes a generalized approach for roaming honeypots in conjunction with markov chain analysis which can be combined with other decision support techniques for prediction of honeypot location as a future work. The data captured can be analysed for types of attacks and to develop the tools to secure the network.