1 Introduction

In this present era of computing, information is a major asset for every organization. From the local area network to the currently available highly connected internet, the world is being benefited from the easiness of data storage and access. With the introduction of cloud computing, maintenance of data has also become a simple task. However, this flexibility introduces a problem of data security as a major issue. This is due to the fact that intruders and hackers are also enjoying technologies for their security-threatening activities. And also, by subscribing to the cloud, cloud consumers permit external sources to hold control of their data. This creates room for hurdles. So data security and privacy are very serious problems in the victory of an industry process. Hence, organizations are using different solutions to achieve data security.

An efficient intrusion detection system should be fast, self-monitored, fault tolerant, easily configurable, difficult to cheat, available without interruption, and free from false errors with an overhead as minimum as possible. Its aim is to evaluate information systems and to perform early detection of malicious activity for reducing the security risk to an acceptably low level. High false-positive alarm rate may disrupt information availability, whereas high false-negative alarm rate may result in serious damage to the protected systems in the form of inappropriate access to sensitive information and data damaging. The performance of IDS is based on the amount of sufficient log data, its continuous updates on them, and the correct and quick detection of intrusion from the comparison between current activity of the user and the historical data.

The rest of this paper is organized as follows: Sect. 2 outlines IDS in cloud. The concept of genetic algorithm-based fuzzy neural architecture in intrusion detection is described in Sect. 3 and our proposed method is described in Sect. 4. Results of experiments are presented and analyzed in Sect. 5. At last, Sect. 6 concludes our work.

2 Intrusion Detection in Cloud

For the purpose of achieving a green world and also due to the invention of new technologies, paper-work is vanishing from existence and the concept of digital data is introduced and is gaining importance. But these digital data may require hardware, software, and networks for their storage and access which may lead to practical complexities. A potential solution to these issues is a cloud storage service. This service offers several benefits including user-friendly access, maintenance and sharing of large volume of data, synchronization of various devices, and more importantly, cost efficiency. Large amount of sensitive information can be stored on computers. This information stored on the cloud may be precious to others with malicious motive. So cloud providers follow various methodologies to solve these issues. But, any unauthorized element including a provider should not gain access to cloud storage, since a small damage or loss of data may be a potential harm to its owner. Firewalls and encryption are the most commonly used technologies for providing security and privacy. But they may not be of enough efficiency to protect data against intrusions. However, they can be used at the earlier part of the intrusion detection process.

Machine learning is one of the methods adopted to guide the system for detecting intrusions. A grid and cloud computing intrusion detection system is proposed in [1] to detect intrusions by employing an artificial neural network for teaching the system. An IDS with the central management approach [2] has been developed to address heterogeneity and virtualization features of cloud computing. In [3], a model based on hypervisor has been proposed for protecting the system from diverse attacks in the infrastructure layer which improves the system. An individual IDS is suggested in [4] where a single controller manages the instances of IDS, exploiting the knowledgebase and ANN pattern matching techniques. The fully distributed intrusion detection system developed in [5] implements a hybrid approach for detecting intrusions using host and network-based audit information for cloud computing. Numerous cloud intrusion detection techniques function on different layers of clouds independently [6].

Autonomic clouds have emerged as an outcome of integrating autonomic computing and cloud computing, yielding in stable and friendly cloud architectures and deployments. So they have contemporarily fascinated researchers to compose cloud intrusion detection engines with negligible human intervention. Network virtualization concept is handled in [7] for a cloud-based intrusion detection and prevention system without any control over the host. Table 1 portrays the drawbacks of current intrusion detection solutions, whereas our proposed method accomplishes the following benefits:

Table 1 Existing intrusion detection approaches

2.1 Clustering and Concept Learning

Based on mentioned criteria, grouping of elements is done through which the system can be trained to classify intrusion detection dataset into five groups (one group of normal patterns and four groups of intrusive patterns). This is achieved through data reduction, data analysis, and data classification.

2.2 Predictive Learning

From the temporal intrusion detection dataset and several sequences of resource access events, the system learns intrusive and non-intrusive behaviors of users. This is done by understanding even small changes that occur in the behavior of users, which could be impossible in expert systems.

2.3 Modularity

We have used the structured methodology for intrusion detection with four phases, where each phase gathers its input from its previous phase. However, the processing of each phase is completely isolated from the remaining phases which could avoid unexpected operational and configuration errors. Its modular nature makes system development, system debugging, and system administration simple and dynamic.

3 Fuzzy Neural Network Based on Genetic Algorithm

An artificial neural network (ANN) is a way of reasoning inspired by principles of neurons in the central nervous system of living things. It is a set of one or more layers of nodes interconnected by directed, weighted links, and trained with desired input–output patterns through which it can learn weights on the links that provide the correct outputs for the training and similar inputs. So it can be used to create a decision support system (DSS) when sufficient training examples exist. Fuzzy logic is a technique for the representation and handling of imprecise and vague information through a fuzzy rule-based system. In spite of this benefit, it typically consumes much time for designing and tuning the membership functions. Moreover, it does not contain a competent learning algorithm to minimize output errors. So neural network can be used for this same task to get benefited from learning, adaptation, fault tolerance, parallelism, and generalization. But it suffers from relatively slow learning process. To complement each of the above individual systems with their advantages, fuzzy neural systems or neuro-fuzzy models were developed as intelligent hybrid systems which are successful in several real-world applications.

Neural fuzzy systems may take up type-1 fuzzy sets, which characterize uncertainties using numbers in the range [0, 1]. But, it needs each and every element of the universal set to be associated with a specific real number. On the other hand, type-2 fuzzy sets model uncertainties by expressing their membership functions as type-1 fuzzy sets. They are of three dimensions in which the last dimension is the membership degree of each point on its two-dimensional domain. A type-2 fuzzy neural network (T2FNN) incorporates a type-2 fuzzy process as its antecedent and a two-layer neural network as its consequent [8]. A general T2FNN is complex due to the complication of type reduction process and they are not easy to follow for a variety of reasons. An algorithm for computing the centroid of an interval type-2 fuzzy set is developed in [9]. Here, the third dimension does not contribute any new information since its value is same in all points. Hence, the third dimension is disregarded. Further, making inference from interval type-2 fuzzy set is simple. Therefore, the interval T2FNN (IT2FNN) [10] can be adopted to simplify the computational process.

In the field of intrusion detection, the use of FNN alone is satisfactory for detecting repeated or more-frequent attacks, such as DoS and probing attacks. But it cannot effectively achieve high detection rates for less-frequent attacks, such as R2L and U2R attacks. When K-NN is used alone for anomaly intrusion detection, only 91 % accuracy is achieved in [11]. This inconsistency occurs due to the difference in number of records available in an intrusion detection dataset, related to the more-frequent and less-frequent attacks. Since more-frequent attacks have loads of records in an intrusion detection dataset, FNN learns them accurately and produces exact results. But due to the small number of records corresponding to the less-frequent attacks, FNN cannot be trained accurately for their detection. So it is essential to combine FNN with another methodology which can compensate this crisis.

Genetic algorithm (GA) is a global optimal search approach that makes use of operations found in natural evolution. They work sound on mixed-value, combinatorial problems and offer a suitable technique to problems that require an efficient searching. They are less vulnerable to getting caught at local optima than gradient search methods. But, if genetic algorithm is used independently in intrusion detection, it too does not provide satisfactory results. But in [12], GA produces improved detection rate for U2R attacks which motivated us to combine GA with NN. So GA can be combined with fuzzy neural networks for achieving satisfactory detection rates in both more- and less-frequent attacks by refining the structure and/or parameters of the underlying FNN. The fundamental concept is to preserve a population of chromosomes for symbolizing candidate solutions to the problem being studied. This population matures by time in the course of selection, crossover, and mutation to find the best one(s). In [13], an iterative approach is suggested where each individual produces one rule.

There are various GA-based algorithms to evolve the parameters of membership functions and the number of fuzzy rules. For a GA-based algorithm employed in fuzzy neural networks, the initial network structure is more often given initially. Then GA is adopted for enhancing the systems topology through membership functions and/or fuzzy rules. A fuzzy neural network is used in [14] to extract Mamdani-type fuzzy rules [15] from the input–output space in three stages. But this approach requires fuzzy partitions of input and output spaces in advance. The parameters of membership functions and the number of fuzzy rules are developed in [16] by encoding them to form a complex and long chromosome. But, the number of fuzzy sets for each variable must be provided as earlier information and needs the maximum acceptable number of fuzzy rules.

4 Proposed Intrusion Detection System

In our work, a cloud intrusion detection system (CIDS) is designed for infrastructure-as-a-service (IaaS) model for achieving cloud user security. This system is modeled as a Fuzzy Neural Network based on Genetic Algorithm. Figure 1 shows our proposed system of intrusion detection in cloud environment. This proposed approach consists of four phases, namely Clusters formation, Extraction of initial fuzzy rules, Fuzzy rulebase optimization, and Parameters refinement. A method based on GA is applied for fuzzy rulebase optimization and a neural fuzzy system with dynamical optimal learning algorithm is used for parameters refinement. The completion of phase IV results in the formation of rulebase which can be used for detecting intrusions in real time. So, whenever the cloud user tries to access the service, his/her pattern of activity is audited across fuzzy rulebase. Inference from this comparison is used in making decisions regarding intrusions and normal activities.

Fig. 1
figure 1

Proposed model for cloud intrusion detection system

4.1 Clusters Formation

Fuzzy rule extraction methods depend on clustering in the dataset. Each cluster should effectively classify a region in the input data space that can enclose a sufficient amount of data to hold the presence of a fuzzy input/output relationship. The Fuzzy C means algorithm [17] and Kohonen’s Self-Organizing Map [18] are familiar clustering algorithms. But their goodness relies upon the initial values of the number of cluster centers and their locations. Mountain method [19] is proposed for their estimation. In spite of its simplicity and effectiveness, it has exponential growth of computation with the size of the problem. In subtractive clustering [20], without any initial guess, all data points are regarded as cluster centers. But the trouble is that at times the real cluster center may not be pinpointed to one of the data points. And also, if the neighborhood radius is too small, the effect of neighboring data points will be ignored. If it is large, it will include all data points in its neighborhood. Still, this method provides a high-quality approximation with shortened computations. K-means clustering [21] achieves better accuracy than other clustering techniques. FCM clustering also achieves accuracy close to K-means clustering. But it is computationally slower due to the complications involved in its fuzzy measures calculations. In our work, for phase I of Fig. 1, we use modified K-means algorithm [22] to separate the training dataset into several clusters through symmetry distance between patterns. The similarity between each pattern and each existing cluster is computed to determine whether the pattern has to be associated with an already existing cluster or with a new cluster. To create new clusters, Minkowski distance [23] is used rather than Euclidean distance used in [22].

Let the training dataset contain \( k \) number of records \( a_{1} ,a_{2} , \ldots ,a_{k} \) where each record \( a_{j} \) has \( r \) patterns \( \left[ {p_{1j} ,p_{2j} , \ldots ,p_{rj} } \right] \). Each record \( a_{j}, \) \( 1 \le j \le k \), indicates an attack type and is associated with an output \( d_{j} \). This \( d_{j} \) denotes the type of an attack that should be detected successfully for an input record \( a_{j} \) during training and testing. Let n be the number of clusters, where clusters are represented by \( C_{1} ,C_{2} , \ldots ,C_{n} \). Initially no clusters exist. Through the process of training only, clusters are formed based on the similarity between records. Each \( C_{i} \) has mean \( m_{i} = \left[ {m_{1i} ,m_{2i} , \ldots ,m_{ki} } \right] \), deviation \( \rho_{i} = \left[ {\rho_{1i} ,\rho_{2i} , \ldots ,\rho_{ki} } \right], \) and a height \( h_{i} \) that is the mean of expected outputs of patterns enclosed in \( C_{i} \). Let \( S_{i} \) be the number of training patterns available in \( C_{i} \). The following is an algorithm used in our work for cluster formation:

  1. 1.

    Select K patterns at random and initialize their cluster centroids as \( C_{1} ,C_{2} , \ldots ,C_{k} \).

  2. 2.

    Use traditional K-means clustering for initializing centroids of all clusters.

  3. 3.

    When they converge or a prespecified terminating condition is fulfilled, goto next step.

  4. 4.

    For pattern \( x \), find the cluster centroid by \( k^{ * } = \arg \mathop {\hbox{min} }\limits_{i = 1}^{K} d_{S} \left( {x,C_{i} } \right) \), where \( d_{S} (x,C_{i} ) \) is the symmetry distance between a pattern \( x \) and \( C_{i} \).

  5. 5.

    If this calculated distance is less than a prespecified distance, assign \( x \) to \( C_{{k^{ * } }} \).

  6. 6.

    Else assign \( x \) to the cluster \( C_{{k^{ * } }} \), by calculating \( k^{ * } = \arg \mathop {\hbox{min} }\limits_{i = 1}^{K} d_{M} \left( {x,C_{i} } \right) \) where \( d_{M} (x,C_{i} ) \) is the Minkowski distance between \( x \) and \( C_{i} \).

  7. 7.

    Calculate the updated centroid of \( K \) clusters by \( C_{k} \left( {t + 1} \right) = \frac{1}{{N_{k} }}\sum\limits_{{i \in U_{k} \left( t \right)}} {x_{i} } \). Here, at time \( t \), \( U_{k} \left( t \right) \) is a collection of patterns in \( kth \) cluster and \( N_{k} \) is the cardinality of \( U_{k} \).

  8. 8.

    If the clusters are modified or the current iteration number is less than the predefined limit, goto (3). Else, stop.

The above algorithm results in a collection of clusters after the completion of processing all training patterns. Type-2 fuzzy TSK rules are then constructed from these clusters.

4.2 Fuzzy Rules Extraction

For extracting fuzzy rules from a dataset (for phase II shown in Fig. 1), several methods were proposed. A fuzzy partitioning method is proposed in [24] to create a set of fuzzy rules. This method segregates the input space into a group of subspaces. But it has increased complexity in terms of speed. In [25], a combination of unsupervised clustering and gradient descent approach is explained for the same purpose. But, for enormous volume of patterns, it is also a time-consuming method. Another method [26] extracts fuzzy rules by first splitting the output space into several clusters. Then, based on the expected output, training patterns are attached to clusters. In this way, partitioning of input space is done. But this method suffers from clusters overlapping. A genetic algorithm-based approach is given in [27] to generate rules from datasets.

At first all data are normalized into the range [0, 1] and then each cluster \( C_{i} \) is converted into a type-2 fuzzy rule of the form

\( {\text{If}}\;a_{1} \;{\text{is}}\;\tilde{A}_{1i} \;{\text{and}}\;a_{2} \;{\text{is}}\;\tilde{A}_{2i} \;{\text{and}}\; \ldots {\text{and}}\;a_{k} \;{\text{is}}\;\tilde{A}_{ki} \quad {\text{Then}}\;b\;{\text{is}}\;c_{i} = \omega_{0i} + \omega_{1i} a_{1} + \cdots + \omega_{ki} a_{k} \), where \( \tilde{A}_{1i} ,\tilde{A}_{2i} , \cdots \tilde{A}_{ki} \) are type-2 fuzzy sets for \( a_{1} ,\;a_{2} , \ldots ,a_{k} \) and each \( \omega \) is a rule weight of real type. The membership function of \( \tilde{A}_{ij} \),\( 1 \le i \le k \), is \( \mu_{{\tilde{A}_{ij} }} \left( {a_{i} } \right) = {\text{gauss}}\left( {u;{\text{gauss}}\left( {a_{i} ;m_{ij}^{p} ,\sigma_{ij}^{p} } \right),\sigma_{ij}^{s} } \right) \), where \( u \in \left[ {0,\;1} \right] \). Each rule includes mean, deviation, and the scaled deviation as its antecedent parameters and the rule weight as its consequent parameters. The weight is tied to a rule to determine a degree of fulfillment by multiplying with antecedents.

4.3 Fuzzy Rulebase Optimization

The generated T2FNN is based on neurons. Each neuron possesses a center vector and a width vector, thus symbolizes one cluster in the input space. This initial structure of a T2FNN may be redundant and may lead to unnecessary rules in the extracted set of fuzzy rules. Hence, for optimizing the topology of initial network structure, GA is recommended. This optimizes fuzzy rulebase (for phase III of Fig. 1) by identifying the least important fuzzy rules. GAs represent data as chromosomes. The fitness value of a chromosome should be a function of both the number of fuzzy rules and the network performance [28] for selecting the best individuals. The process is rerun for several generations until reaching the individual(s) that firmly accommodate the desired condition [29].

  1. 1.

    Initialize population by selecting random individuals.

  2. 2.

    Repeat

    1. a.

      For the size of the population

      1. i.

        Select two individuals as \( parent_{1} = w_{1} ,w_{2} , \ldots ,w_{len} \) and \( parent_{2} = x_{1} ,x_{2} , \ldots ,x_{len} \) by standard roulette wheel selection, where \( len \) is the number of fuzzy rules in the initial rulebase.

      2. ii.

        Apply the following crossover to produce a child as \( child = o_{1} ,o_{2} , \ldots ,o_{len} \), for \( i = 1,2, \ldots ,len \):

        $$ o_{i} = \left\{ \begin{aligned} p_{i} ,\quad {\text{if}}\;w_{i} = x_{i} \hfill \\ o_{i} ,\quad {\text{if}}\;w_{i} \ne x_{i} \;{\text{and}}\;\tau \ge 0.5 \hfill \\ \bar{o}_{i} ,\quad {\text{if}}\;w_{i} \ne x_{i} \;{\text{and}}\;\tau < 0.5 \hfill \\ \end{aligned} \right.. $$

        Here, \( \tau \) is a random number.

      3. iii.

        Apply mutation to the \( child \) with mutation rate as 0.1.

      4. iv.

        Calculate the fitness values for \( child\left( {f_{C} } \right),\;parent_{1} \left( {f_{P1} } \right)\;{\text{and}}\;parent_{2} \left( {f_{P2} } \right) \) as

        \( f = \frac{1}{{RMSD^{2} + \frac{1}{{\left( {1 + len - ctlen} \right)^{2} }}}} \),

        where \( ctlen \) is the number of fuzzy rules in the current rulebase and RMSD is the root-mean-square deviation.

      5. v.

        Calculate \( d_{P1} \) as the Minkowski distance between \( parent_{1} \) and \( child \).

      6. vi.

        Similarly, calculate \( d_{P2} \) between \( parent_{2} \) and \( child \).

      7. vii.

        If \( d_{P2} > d_{P1} \), and \( f_{C} > f_{P1} \), replace \( parent_{1} \) with \( child \). Else if \( d_{P2} \le d_{P1} \) and \( f_{C} > f_{P2} \), replace \( parent_{2} \) with \( child \).

    2. b.

      End For

  3. 3.

    Until the maximum number of generations is reached.

  4. 4.

    Extract the best individuals as a final solution.

This phase is concluded with the compact set of type-2 fuzzy rules by eliminating least significant fuzzy rules.

4.4 Refinement of Fuzzy Rules

This phase is included in our proposed system to refine the parameters of generated fuzzy rules. A self-evolving IT2FNN is proposed in [30] which combines online clustering and rule-ordered Kalman filter algorithm. Gradient descent backpropagation and gradient descent with adaptive learning rate backpropagation are suggested in [31] to refine parameters. We proceed to improve the accuracy of these fuzzy rules through dynamical optimal learning algorithm. In all iterations of training, we begin to improve antecedent parameters by keeping consequent parameters as constant. After that, consequent parameters are refined by holding antecedent parameters as constant. This refinement technique is repeated until this phase results in preferred approximation precision. As mentioned earlier, consequent parameters are kept as fixed initially and IT2FNN is used with gradient descent learning to optimize antecedent parameters. By using BP method, for \( k \) input–output training patterns \( \left[ {p_{j} :d_{j} } \right],\;j = 1,2, \ldots ,r \), the following function of mean square error (MSE) is considered:

$$ e_{j} = \frac{1}{2}\left[ {y\left( {p_{j} } \right) - d_{j} } \right]^{2}. $$
(1)

Equation (1) is our objective function that is to be minimized (optimized). Here, \( y\left( {p_{j} } \right) \) is the actual output for \( j{\text{th}} \) pattern. We use Eq. (2) to refine mean and a similar type of equation is used to refine standard deviation by keeping the weight as fixed.

$$ m_{j1}^{i} \left( {l + 1} \right) = m_{j1}^{i} \left( l \right) - \left( {\frac{{\alpha \left( {y\left( {p_{l} } \right) - d_{l} } \right)\left( {p_{jl} - m_{j1}^{i} } \right)N\left( {m_{j1}^{i} ,\sigma_{j}^{i} ;p_{jl} } \right)}}{{2\sigma_{j}^{{i^{2} }} }} \times \frac{{\left( {\prod\nolimits_{\begin{subarray}{l} q = 1 \\ q \ne j \end{subarray} }^{t} {\bar{u}_{{\tilde{A}_{q}^{i} }} } } \right)\left( {\omega_{1i} - y_{1} } \right)}}{{\left( {\sum\nolimits_{i = 1}^{L} {\bar{g}^{i} } + \sum\nolimits_{i = L + 1}^{M} {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{g}^{i} } } \right)}}} \right). $$
(2)

Equations (3) and (4) are used to tune consequent parameters by keeping antecedent parameters fixed. If \( i > L \),

$$ \omega_{1i} \left( {l + 1} \right) = \omega_{1i} \left( l \right) - \alpha \left( {\frac{{\left( {y\left( {p_{l} } \right) - d_{l} } \right)}}{2}} \right)\left( {\frac{{\prod\nolimits_{q = 1}^{t} {\bar{u}_{{\tilde{A}_{q}^{i} }} } }}{{\sum\nolimits_{i = 1}^{L} {\bar{g}^{i} } + \sum\nolimits_{i = L + 1}^{M} {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{g}^{i} } }}} \right). $$
(3)

If \( i > L \),

$$ \omega_{1i} \left( {l + 1} \right) = \omega_{1i} \left( l \right) - \alpha \left( {\frac{{\left( {y\left( {p_{l} } \right) - d_{l} } \right)}}{2}} \right)\left( {\frac{{\prod\nolimits_{q = 1}^{t} {\underline{u}_{{\tilde{A}_{q}^{i} }} } }}{{\sum\nolimits_{i = 1}^{L} {\bar{g}^{i} } + \sum\nolimits_{i = L + 1}^{M} {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{g}^{i} } }}} \right), $$
(4)

where \( \alpha \) is the learning rate parameter, \( M \) is the total number of rules in the rule base, \( t \) is the number of inputs to rule \( i \), \( \underline{u}_{{\tilde{A}_{q}^{i} }} \left( {p_{j} } \right) \) is the lower bound membership value of fuzzy sets \( \tilde{A} - q^{i} \), \( \bar{u}_{{\tilde{A}_{q}^{i} }} \left( {p_{j} } \right) \) is the upper membership value of fuzzy sets \( \tilde{A} - q^{i} \), and \( * \) is the product t norm. Here, iterative Karnik–Mendel method is used to acquire \( L \) and \( R \) values. Here,

$$ N\left( {m_{j}^{i} ,\sigma_{j}^{i} ;p_{j} } \right) = \exp \left( { - \frac{1}{2}\left( {\frac{{p_{j} - m_{j}^{i} }}{{\sigma_{j}^{i} }}} \right)^{2} } \right) $$
(5)
$$ \underline{g}^{i} = \underline{u}_{{A_{i1} }} \left( {p_{1} } \right) * \underline{u}_{{A_{i2} }} \left( {p_{2} } \right) * \cdots * \underline{u}_{{A_{in} }} \left( {p_{n} } \right) = \prod\limits_{q = 1}^{t} {\underline{u}_{{A_{q}^{i} }} \left( {p_{q} } \right)} $$
(6)

and

$$ \bar{g}^{i} = \bar{u}_{{A_{i1} }} \left( {p_{1} } \right) * \bar{u}_{{A_{i2} }} \left( {p_{2} } \right) * \cdots * \bar{u}_{{A_{in} }} \left( {p_{n} } \right) = \prod\limits_{q = 1}^{t} {\bar{u}_{{A_{q}^{i} }} \left( {p_{q} } \right)}. $$
(7)

Similarly consequent parameters are optimized by fixing antecedent parameters as constant. After the system is trained, it can be used for detecting intrusions in real time. When the cloud user requires access to the service, his/her pattern of activity is checked against fuzzy rulebase. Inference from this comparison is used in making decisions regarding intrusions, after type reduction and defuzzification.

5 Experimental Results

Our CIDS consists of two entities, namely Cloud and Consumer. We define Cloud as an entity which hosts user’s data and acts as Infrastructure-as-a-service (IaaS) provider. Consumers are defined as entities which request services from the Cloud. We have created the Cloud environment with the machine configured with Intel core 2 Duo CPU, 2.5 GHz, 4 GB memory, and GNU = Linux kernel 2.6. 32. This Cloud is designed using Eucalyptus [32] for facilitating Consumer to avail IaaS services. We conducted experiments using Cloud Intrusion Detection Dataset (CIDD) [5] in three steps labeled, training, validation, and testing. In training, fuzzy rules are extracted and optimized by our proposed IDS architecture using the training data to achieve maximum accuracy on the dark data. The validation step is to assess how precisely a system will act upon in practice. This method observes the error on validation dataset and terminates system training when this error starts to increase. In the testing phase, the test data are passed through the saved trained model to detect intrusions.

The cloud intrusion detection dataset consists of connection records for describing normal and intrusive accesses. Each such record has 41 features for describing a total of 24 attack types. These attack types are grouped into four main categories of attacks: denial-of-service (DoS), probe, remote-to-local (R2L), and user-to-root (U2R), as shown in Table 2. The original dataset is of size 744 MB with 4,940,000 records, where each record defines a specific attack. Using all 41 features could result in a big IDS model with increased computation time, which could be an overhead for online detection. Moreover, if the dataset contains unrelated features, analysis will be difficult to detect suspicious behaviors. CIDS must therefore reduce the amount of data to be processed. By using an intelligent input feature selection [33], the number of features is reduced to 12 and shown in Table 3. Features in the datasets are of different forms such as symbolic, discrete, and continuous. Most pattern classification methods are not capable to process data in such a format. So each feature is linearly scaled to the range [0, 1] by preprocessing. We selected 20,000 records made at random in which 12,000 are normal and 8000 are intrusive. Training, validation, and testing enclose 11,982, 4810. and 3208 samples, respectively. Since the dataset has five diverse classes, we carried out a five-class binary classification.

Table 2 Types of attacks
Table 3 List of features considered in the proposed method

Experiments are conducted for the threshold values of 0.05, 0.10, 0.15, and 0.20. These thresholds are considered as mutation rates. To get away from local minima, we have used these di_erent error threshold values randomly in our experiments. The results are observed at the end of phases III and IV which are given in figures from 2 to 5. Figure 2 explains the accuracy obtained at the end of phase III for training dataset and Fig. 3 explains the same for the test dataset.

Fig. 2
figure 2

Training set detection rate—without parameters refinement

Fig. 3
figure 3

Test set detection rate—without parameters refinement

Using the same training and test datasets, results are obtained at the completion of phase IV. These results are explained in Figs. 4 and 5 and we found significant improvement in detection accuracy which shows the importance of parameter refinement phase in our proposed intrusion detection process. The average detection rates for each category of pattern during testing and training are shown in Table 4. It is inferred that the proposed approach achieves the accuracy of 98.70 % during training phase and 98.49 % during testing phase. Moreover, we obtained 0.99 and 0.98 classification rates during training and testing, respectively.

Fig. 4
figure 4

Training set detection rate—with parameters refinement

Fig. 5
figure 5

Test set detection rate—with parameters refinement

Table 4 Detection accuracy

Since we found that our cloud intrusion detection system achieves better results with parameters refinement phase, the following tables were given by considering measurements obtained after parameters refinement phase.

We conducted experiments for various population sizes (200, 400, and 600). Table 5 gives detection accuracies during training for every combination of threshold value and number of generations. Similar types of experiments were conducted during testing and the results are shown in Table 6.

Table 5 Training accuracy based on population size
Table 6 Testing accuracy based on population size

From these measurements, we infer that lower populations present good accuracies with lower threshold values, whereas higher populations achieve good results with moderately higher threshold values. Additionally, lower populations exhibit likely equal performance irrespective of the number of generations. Analogous experiments are conducted for different number of generations (200, 500, 1000, and 2000). Results are shown in Tables 7 and 8. Their observations imply that lower number of generations produces good accuracies with lower populations and higher thresholds. On the other hand, higher number of generations brings fine accuracies for lower populations and lower thresholds.

Table 7 Training accuracy based on number of generations
Table 8 Testing accuracy based on number of generations

Table 9 shows MSE values obtained from training phase and testing phase for various values of θ, which gives the average MSE of 2.001 and 2.6768 during training and testing, respectively.

Table 9 Training and testing MSE based on threshold θ

Tables 10 and 11 show the confusion matrix during training and testing, respectively. From these tables, it is inferred that, though we have some detection degradation for DoS and R2L attacks, our system behaves perfectly in identifying probe and U2R attacks. Further its true negative prediction is increased by 0.8354 % during testing.

Table 10 Confusion matrix—training
Table 11 Confusion matrix—testing

We analyzed the rate of recall, precision, accuracy, and \( F_{1} \) score for training and testing and the results are given in Table 12. Further, we compared them with that of other methods. This comparison shows that the proposed system can achieve good results in the field of cloud intrusion detection.

Table 12 Comparison with other approaches

Results of using NN and GA individually in identifying different types of attacks in intrusion detection are shown in Table 13 along with the results of our proposed method which shows that our proposed method behaves well in identifying all four types of attacks with an average detection rate of 98.598 %.

Table 13 Results of using NN or GA alone in intrusion detection

To show the effectiveness of our proposed cloud intrusion detection system, we conducted similar experiments using UCS (supervised learning classifier system) [37] and NICE (Network Intrusion detection and Countermeasure sElection in virtual network systems) [39] for detecting intrusions in a cloud environment. The results of comparison are made by varying number of jobs and number of nodes (scalability) and they are explained as follows:

Further, Table 14 gives the average success rate of intrusion detection when a single framework is adopted in various researches. In all these cases, the achieved detection rate is less than the average detection rate (98.598 %) of our proposed method which shows the importance of our hybrid technique.

Table 14 Comparison of average intrusion detection rates under various non-hybrid frameworks

Rates of error in detecting intrusions (false negatives and false positives) during training are shown in Fig. 6 for the three intrusion detection approaches (UCS, NICE, and Proposed method). From this figure, it is inferred that all the three methods achieve approximately similar error rate for less number of jobs. But our proposed method gives lower error rate than that of remaining two methods and this accuracy improvement is achieved through the application of our hybrid method, whereas UCS and NICE employ single methodology.

Fig. 6
figure 6

Comparison of error rate under different number of jobs

As shown in Fig. 7, the percentages of CPU (resource) utilization for different number of jobs under the three different methods are compared. In all three methods, CPU usage is almost identical for less number of jobs. As the number of jobs tends to increase, UCS, NICE, and proposed methods exhibit different performances in CPU utilization and our proposed hybrid intrusion detection approach consumes less CPU time than that of other two methods. It saves 28 % of CPU time than NICE and 33 % of CPU time than UCS in the field of cloud intrusion detection during training phase.

Fig. 7
figure 7

Comparison of CPU utilization under different number of jobs

To show the speed of computation, we varied the number of nodes in a cloud for same number of jobs. For each of the three frameworks, average execution time during training is measured. Figure 8 shows the results of these experiments incorporating intrusion detection in a cloud environment. The experimental results prove that the speed of the proposed method is directly proportional to the size of the cloud and when we double the number of nodes for similar set of jobs, its speed up ratio is also increased by a factor of 2.15 approximately. Further, it has improved speed up ratio than that of NICE for larger number of nodes in a cloud. This shows the commendable scaling tendency of our proposed method toward large networks. But for small number of nodes, NICE produces better speed up ratio. But this deviation is only about 0.07. But still, this issue of our proposed method has to be addressed in future.

Fig. 8
figure 8

Comparison of speedup ratio under different number of nodes

6 Conclusion

In this paper, we have proposed a methodology for modeling IDS in a cloud. The intrusion detection training dataset is partitioned into several clusters with similar patterns belonging to the same cluster using modified K-Means algorithm with Minkowski distance. Each of the resulting clusters is defined with the membership function by statistical mean and deviation which results in a type-2 fuzzy TSK-rule. A fuzzy neural network is established correspondingly and the compact fuzzy rulebase is constructed by applying genetic algorithm. Then the associated parameters are refined by dynamical optimal learning. Later, for every new input from the test dataset, a crisp output is obtained by integrating the inferred results of all the rules. This result is then compared with the desired result based on which detection rates are calculated. The proposed scheme enables cloud users to avail cloud services without worrying about intrusions when they wish to switch to cloud data centers. Since the achieved results are optimum as compared to other approaches in the field of attack detection, we trust that our proposed model provides a convincing method.