1 Introduction

Due to the increasing importance of the radio waves, the use of dynamic techniques has been identified as a way to improve the efficiency of the spectrum management process [1]. This paper aims to introduce the various techniques that can be used to improve the efficiency of the spectrum. When the licensed spectrum isn’t being used by the primary users, cognitive radio refers to this situation. The unused spectrum is then allocated to the secondary users or unlicensed users. If the users in these areas resume their activities, the new users will have to vacate the spectrum.

The process of transferring a spectrum from one user to another involves a spectrum handover. This occurs when the latter has started to transmit and the licensed user has to access the channel. The secondary user then moves to an idle channel until the unlicensed user has finished his transmission. The spectrum handoff process can help design an efficient network architecture [2]. It involves carrying out various steps, such as the evaluation and maintenance phases. The evaluation phase of the process is when a device, which is a cognitive type, studies the environment and its specifications in order to determine if it should use the spectrum. After the device has decided that it needs to use the spectrum, it stops the transmission of the data and sends the frequency to a different channel.

Figure 1 shows the concept of spectrum handoff, which is related to the presence or absence of a primary channel and two secondary users [3, 4]. There are four possible scenarios involving the presence of a primary user in a channel. The primary users must immediately stop all communication and leave the spectrum. The secondary users must then look for the appropriate spectrum before they can resume their operations. The PU must also leave the channel completely. The SUs can then switch to the new channel to continue their communication. The allocation of the spectrum to the secondary users based on their priority is determined once the primary user moves to a different channel.

Fig. 1
figure 1

The phases in Spectrum Handoff - The communication started between two SUs on free band 1; PU appeared, the SUs are searching for a new free band; SUs shifted to free band 3; PU vacated the band 1, so the SUs started communication between band 1 and band 3

The spectrum handoff is composed of two types: reactive handoff [5] and proactive handoff [6]. The concept of reactive handoff is to plan and implement the operation according to the network’s requirements, which can be affected by the prolonged handoff latency and interference. On the other hand, proactive handoff is carried out according to historical data usage. The authors of this paper discuss the advantages of reactive and proactive handoff over spectrum handoff. In one study [7], Guipponi et al. proposed a fuzzy-based method to handle spectrum handoff. In another [8], Wang and colleagues looked into the link maintainability of networks when the SU vacates a place. In another study [9], the authors analyzed the effects of spectrum handoff on the maintainability of a network after the SU moves out of a position. They discovered that the performance of the handoff had not been thoroughly studied. In order to minimize the disruption caused by the handoff, the authors [10] suggested that a voluntary spectrum handoff scheme be introduced.

The authors of [11] looked into the various factors that influence the performance of a spectrum handoff. They then developed a set of metrics that can help analyze the operation’s progress. Some of these include the number of handoffs, the switching delay, the non-completion probability, and the link maintainability [12]. The spectrum handoff is mainly used for mobile platforms, such as smart phones [13, 14]. It can be performed efficiently by implementing a hybrid spectrum handoff strategy, which automatically identifies the channel that needs to be handled and provides a quick response. However, it can also cause poor handoff due to the delays in the traffic [15, 16]. The process of PUTPOSH is to determine if a handoff is required based on the arrival of the PU and the service time [17]. The queue model used in this process is called the M/G/1 queue. It shows the status of each task as it progresses. The slowest user will be notified when the idle channel is spotted [13].

Unfortunately, current multiple access techniques are not capable of handling the massive amount of traffic that will inevitably occur in the future. One of the most promising ways to improve the efficiency of future communications is by implementing non-orthogonal multiple access (NOMA) [18]. This technique can be used to provide a vastly increased spectral efficiency. In NOMA, a multi-user signal is multiplexed using a superposition coding technique in the transmitter. The users are then sent using different power levels depending on their channel conditions. This method ensures that the users with the most problematic channel conditions are given the highest power allotment. Having the correct channel state information in the transmitter is very important in order to improve the efficiency of the system.

The users are then sent using different power levels depending on their channel conditions. The stronger the channel, the faster the user can retrieve its signal. On the other hand, the weaker one, which is usually recognized as interference, performs a series of interference cancellation techniques to regain its original signal. In most conventional multiple access systems, the goal is to maintain the symmetry between the channels by using guard periods. However, this method can be very inefficient due to the interference caused by the guard period. In NOMA, the signals are sent using different power levels and the guard period is completely removed. Despite the advantages of NOMA, it is still very challenging to maintain the level of user fairness due to the complexity of the receiver and the need for a perfect channel state information. In order to solve these issues, we have used machine learning techniques to improve the efficiency of the system.

2 Machine Learning Algorithms

Figure 2 shows a basic machine learning work flow, that starts with the training data and then the labels [19, 20]. These are used by the algorithm to distinguish between the different types of data. The machine learning algorithm block contains various labels and features that are required for training. After completing the training phase, the block generates a predictive modeling model. This model is then used to predict the future state of the data. The data collected during the prediction phase is then analyzed and transformed into features that are used in the model block’s final output.

Fig. 2
figure 2

Machine Learning workflow with training and prediction process

Classification and regression algorithms are two types of supervised machine learning techniques that are commonly used in the prediction phase [21]. In the former, we can predict the continuous values, while in the latter, we need to predict the categorical values. The classification algorithm is a method that identifies the new observations that are coming from the training data. A program can classify new observations by learning from the given data set and then grouping them into various classes or groups. For instance, if there are no cats or dogs in the dataset, then there are no classes or labels for “yes” or “no” Spam.

2.1 Logic Regression

Although the terms linear and logistic regression are similar, they are not used in the same way [22]. For instance, in linear regression, the goal is to find the optimal solution to a given problem. In logistic regression, the goal is to find the classification challenges that are related to the given function. Since p is an unbounded function, we need to first compute its logit transformation in order to make it a linear one. This is done by taking into account the log p(x)/(1- p(x).

$$\:{log}\frac{p\left(x\right)}{1-p\left(x\right)}={\beta\:}_{0}+\beta\:.x$$
(1)

After solving for p(x):

$$\:p\left(x\right)=\frac{{e}^{{\beta\:}_{0}+\beta\:}}{{e}^{{\beta\:}_{0}+\beta\:}+1}$$
(2)

To convert a linear classification into a statistical model, we can select a threshold, such as 0.5. The likelihood of the predicted class is computed by taking into account the training data points x and y.

2.2 KNN Algorithm

The concept of finding the nearest neighbor is a process that involves identifying the points within a given data set that are closest to the source. The algorithm uses a combination of tests and majority votes to find the most appropriate cases. Before implementing KNN, the first step is to transform the data into its vector values [23]. This process takes into consideration the distance between the points and the test data, and it predicts if these are similar. The classification of points is based on the probability that they share the same points. The distance function can be utilized to determine the Minkowski, Hamming, or the Euclidean distance [24]. The distance between two points is computed using the formula known as the “Euclidean Distance”.

$${\rm{D}}((x1,y1)(x2y2)) = \sqrt {{{({{\rm{x}}_2} - {{\rm{x}}_1})}^2} + {{({{\rm{y}}_2} - {{\rm{y}}_1})}^2}} $$
(3)

For each value of K, the algorithm finds the nearest neighbors of the data point. It then passes the class to the data point with the highest number of points out of all the other classes of the same K neighbors. After calculating the distance, the algorithm returns the class with the highest probability.

$${\rm{P}}(Y = {\rm{i}}\left| {{\rm{X}} = {\rm{x}}} \right.) = {1 \over k}\sum {_{{\rm{j}} \in {\rm{A}}}} {\rm{I}}({{\rm{y}}^{\left( {\rm{j}} \right)}} = {\rm{i}})$$
(4)

2.3 SVM Algorithm

One of the most widely used machine learning algorithms is SVM [25], which is used for both regression and classification. In this thread, we will talk about the classification task. It is typically used for medium and small data sets. The main objective of this algorithm is to find the optimal hyperplane that can efficiently separate the data points in two components. We have a set of training examples that are linearly separable. Each example has its own labels that denote either y = + 1 or y = 0. We then create a form that describes the training data,

$$\{ {x_i},{y_j}\};\,where\,i = 1, \ldots \ldots,N,yi \in \{ 0,1\},x \in {R^D}$$
(5)

As the data points can be separated linearly, we assume D = 2 to keep the explanation straightforward.

The objective of the SVM is to orient this hyperplane as far away from the nearest member of both classes as possible. Support vector examples are closest to the ideal hyperplane. From Fig. 3, we can see that two hyperplanes, H1 and H2, respectively, travel via support vectors of the + 1 and 0 classes. So mx + c = 0: H1; mx + c = 1: H2.

Fig. 3
figure 3

SVM classification with hyperplanes

Additionally, the distances between the H1 hyperplane and the origin are (0-c)/|m| and (1-c)/|m|, respectively. Margin can so be provided as.

$$\left. {\matrix{{{\rm{M}} = (1 - c)/\left| {\rm{m}} \right| - (0 - c)/\left| {\rm{m}} \right|} \cr {{\rm{M}} = 2/\left| {\rm{m}} \right|} \cr } } \right\}$$
(6)

where M is the margin twice. Margin can thus be expressed as 1/|m|. The SVM objective is reduced to the fact of maximising the term 1/|m| because the ideal hyperplane maximizesthe margin.

2.4 Naïve Bayes Classifier Algorithm

The Nave Bayes classifier is a perfect tool for developing fast machine learning models [26]. It can predict an object’s probability based on its condition. This is referred to as a probabilistic classification, and it is also known as Bayes’ theorem, which is a type of rule that states that a hypothesis has a probability of being true. The Bayes’ theorem can be written as.

$$P(\:A/B)=\:\frac{\text{P}\left(\text{B}/\text{A}\right)\:\text{P}\left(\text{A}\right)}{\text{P}\left(\text{B}\right)}$$
(7)

The concept of the posterior probability P(A|B), is that a given hypothesis will most likely come true. It is computed by taking into account the likelihood that the correct hypothesis will be presented. On the other hand, the likelihood probability is based on the available evidence. The priority probability P(A), is the likelihood of a theory being presented before the evidence P(B) is examined.

2.5 Decision Tree Classification

The goal of the decision tree algorithm is to predict the class of a given dataset [27]. It starts by comparing the values of the record’s value and the root attribute’s value. The process continues until the tree’s leaf node. When implementing a decision tree, there is one major issue that needs to be resolved: which attribute should be selected for the root node and sub-nodes. There are two methods that are commonly used to select the appropriate attribute: the Gini Index and the Information Gain.

2.5.1 Information Gain

The information gain metric is used to measure the changes in the entropy after a feature segmentation has been carried out. It takes into account the amount of information that a feature provides us about a given class. A decision tree algorithm tries to maximize the information gain (in Eq. (8)) by implementing a method that splits the nodes and attributes into different categories.

$$\:\text{Gain}\left(\text{H,A}\right)=\text{Entropy}\left(H\right)-{\sum\:}_{\text{Values}\left(A\right)}\frac{\left|{H}_{v}\right|}{\left|H\right|}.\text{Entropy}\left({H}_{v}\right)$$
(8)

Where Hv is a subset of set H and has the same value as attribute v, and where the range of attribute A is (A).

Entropy always has a value between 0 and 1. When it equals 0, it is superior to when it equals 1, and when it equals 1, it is inferior. This is how the entropy is calculated:

$$\:\text{Entropy\:}\left(H\right)={\sum\:}_{x=1}^{n}{P}_{x}\text{\:lo}{\text{g}}_{2}^{x}$$
(9)

Px is the ratio between the n-th attribute value and the sample size of the subset.

2.5.2 GINI Index

The GINI index is a representation of the purity of a class when it has split into a specific attribute. The better the split, the higher the sets’ purity. If there are multiple class labels in a dataset D, the index is calculated as follows.

$$\:Gini\left(L\right)=1-{\sum\:}_{i=1}^{k}{P}_{i}^{2}$$
(10)

If the data are divided into two subsets, D1 and D2, of sizes S1 and S2, then the relative frequency pi of the ith class can be calculated. The Gini is then described as.

$$\:\text{G}\text{i}\text{n}{\text{i}}_{\text{A}}\left(D\right)=\frac{{S}_{1}}{S}\text{G}\text{i}\text{n}\text{i}\left({D}_{1}\right)+\frac{{S}_{2}}{S}\text{G}\text{i}\text{n}\text{i}\left({D}_{2}\right)$$
(11)

Impurity reduction is computed as.

$$\:\varDelta\:\text{G}\text{i}\text{n}\text{i}\left(A\right)=\text{G}\text{i}\text{n}\text{i}\left(D\right)+\text{G}\text{i}\text{n}{\text{i}}_{\text{A}}\left(D\right)$$
(12)

2.6 Random Forest Algorithm

The Random Forest technique can perform both regression and classification tasks. It can be used with various decision trees, such as Aggregation and the Bootstrap framework [28]. Instead of relying on individual trees, the goal of the Random Forest technique is to combine several decision trees into a final output. This is done through the use of multiple learning models. One of these is called the Random Forest Base Model. In this part, we perform feature sampling and row sampling from the collected data.

3 Proposed System Model

The proposed system’s workflow is shown in Fig. 4. The operation of each block is described in this section. For our system, it is assumed that there are two base stations, namely, BTS1 and BTS2, which have coverage areas of CVG1 and CVG2 as shown in Fig. 5, respectively. Also, there are various users who are maintaining varying distances from the two base stations. In order to maximize the efficiency of NOMA’s spectrum usage, we have incorporated it into an existing system that separates the users according to their power domain. This method is performed by taking into account the users’ distance from the base station and power requirement as independent variables and requirement of handoff (‘0’ or ‘1’) as dependent variable. The resulting dataset is then generated using software defined radio.

Fig. 4
figure 4

Workflow of the proposed system

In Fig. 5, the moving SU is crossing the boundary of the CVG1. Due to the presence of a PU, the current being used by the SU is being sent to another channel. Through a cooperative spectrum sensing system, the two SUs maintain the same frequency. The user’s red line crosses the system’s frequency when it senses the change. This model considers the two conditions of the spectrum. This model can be easily extended to other spectrum channels. In order to solve this issue, we have developed a novel method that uses machine learning techniques to manage the handoff between the two SUs. ML is a widely used computational intelligence tool that can be used in various fields. The ML model can predict the handoff and idle point of the two SUs based on the data collected by their neighbours. It can also find a new spectrum band whenever the environment changes. To improve its performance, a cooperative sensing mechanism has been implemented.

Fig. 5
figure 5

Spectrum handoff with two cells

4 Experimental Setup-Software Defined Radio (SDR)

The data set is created using the help of software defined radios [29], which is used to improve the interoperability of commercial radio systems. The experimental setup shown in Fig. 6 is composed of two universal software radio peripheral (USRPs) and a couple of computers which are installed with GNU Radio software. The goal of this study was to analyze the performance of different ML algorithms using the USRP N210 hardware and GNU Radio software. These innovations help to reduce the cost of developing and deploying commercial radio systems. One of the most popular platforms that supports software defined radios is the USRP from Ettus Research. This is used in education and wireless networks.

Fig. 6
figure 6

Experimental setup

The Ettus N210 and N200 series of SDR kits are designed for various applications, such as those that can operate in the DC to 6 GHz range. The daughter-board’s RF capabilities are determined by the installed platform. The other components used during the implementation were the mixed-signal daughter-boards from the XCVR2450 and SBX. The USRP features a wide variety of features, such as an integrated digital-to-digital converter. This can be used for various signal processing functions, such as up-sampling and down-sampling. It can also communicate with a host computer. The host computer can access the USRP through the driver provided by Ettus. The software used to operate it is known as GNU Radio, and it has its own oscillator and timing. Through the help of the host computer and the software, the various parameters of the USRP are calculated and noted.

5 Results and Discussion

In this paper, a dataset is created to decide the necessity of spectrum handoff. Various machine learning classification algorithm are employed to provide the optimal boundary based on the trained data. Figures 7, 8, 9 and 10, show the trained set result and test set result for 100 and 500 number of users when various ML algorithms such as Logistic Regression, KNN Algorithm, SVM Algorithm, Naïve Bayes Classifier, Decision Tree Classification and Random Forest Algorithm. These plots are made by using two independent variables i.e., Distance from the base station on the x-axis and Power of each user on the y-axis. The graph shows two regions: the blue and the yellow. The former is represented by the observations in the blue region, while the latter is represented by the observations in the yellow region. Data points in the graph correspond to the users of the dataset, and the two regions represent the prediction and blue observations.

Fig. 7
figure 7

Visualization (100 number of users) of (a) training set result for Logistic Regression (b) test set result for Logistic Regression (c) training set result for KNN Algorithm (d) test set result for KNN Algorithm (e) training set result for SVM Algorithm (f) test set result for SVM Algorithm

Fig. 8
figure 8

Visualization (100 number of users) of (a) training set result for Naïve Bayes Classifier (b) test set result for Naïve Bayes Classifier (c) training set result for Decision Tree Classification (d) test set result for Decision Tree Classification (e) training set result for Random Forest Algorithm (f) test set result for Random Forest Algorithm

Fig. 9
figure 9

Visualization (500 number of users) of (a) training set result for Logistic Regression (b) test set result for Logistic Regression (c) training set result for KNN Algorithm (d) test set result for KNN Algorithm (e) training set result for SVM Algorithm (f) test set result for SVM Algorithm

Fig. 10
figure 10

Visualization (500 number of users) of (a) training set result for Naïve Bayes Classifier (b) test set result for Naïve Bayes Classifier (c) training set result for Decision Tree Classification (d) test set result for Decision Tree Classification (e) training set result for Random Forest Algorithm (f) test set result for Random Forest Algorithm

The blue point observations are for which requirement of spectrum handoff (dependent variable) is probably 0, i.e., users who are under the coverage of base-station 1. The yellow point observations are for which requirement of spectrum handoff is probably 1 means user is not under the coverage of base-station 1. Therefore, it is observed that blue point observations don’t require spectrum handoff, when this user crosses the boundary line then it requires spectrum handoff. It is a good model and prediction. However, there are some data points that are in different regions which can be ignored. To minimize this error, we will use the confusion matrix to analyze the data. The classification shown in Fig. 7(a), (b) and 9(a), (b) is a linear model which is used for logistic regression. In the future, we will learn about non-linear classification techniques. In the first example, the boundary shown in the Fig. 7(c), (d) and 9(c), (d) is irregular because it is a K-NN algorithm that finds the nearest neighbor. It has also classified the users according to their categories. For instance, the blue region is for those who don’t require the handoff, while the yellow region is for those who do. Although the model is showing good results, there are still some yellow and blue points in the different regions. This is not a big issue since doing this model prevents overfitting. The output of the model shown in Fig. 7(e), (f) and 9(e), (f) is similar to the one shown in the previous example. In the output, the hyperplane has been used to classify the users according to their categories. It has also divided the two classes into the blue and yellow regions.

The Nave Bayes classifier (see Fig. 8(a), (b) and 10(a), (b)) shows that it has a fine boundary and segregated the data points. It is a Gaussian curve, and we have used it in our code. However, there are some errors in the predictions that we have made in Confusion matrix. Despite these, it is still a good classifier. The decision tree classification output shown in Fig. 8(c), (d) and 10(c), (d) is different from the other models. It has both horizontal and vertical lines that are splitting the data according to the Distance and Power variable. This is because the tree is trying to capture all the data. Figure 8(e), (f) and 10(e), (f) (Random Forest Algorithm output) is very much similar to the Decision tree classifier. So, in the Random Forest classifier, we have taken 10 trees that have predicted Yes or NO for the handoff. The classifier took the majority of the predictions and provided the result. We can check that there is a minimum number of incorrect predictions without the overfitting issue. We will get different results by changing the number of trees in the classifier.

In the Tables 1 and 2, the predicted output and real test output are given for 100 and 500 number of users respectively. We can clearly see that there are some values in the prediction vector, which are different from the real vector values. These are called prediction errors and are highlighted in the tables for better understanding. If we want to know the number of correct and incorrect predictions, we need to use the confusion matrix. The concept of the confusion matrix is a table that shows the rows that represent the actual classes that the model should have been able to achieve as shown in Figs. 11 and 12. The columns in the matrix represent the predictions that the algorithm has made. However, it is also easy to see which ones are wrong. True or False means that the model was correct, while the other means that there was an error or a wrong prediction. With the creation of the Confusion Matrix, we can now measure the quality of the model. In the Fig. 11(a), we can see the confusion matrix, which has 0 + 3 = 3 incorrect predictions and 19 + 3 = 22 correct predictions. The number of correct and incorrect predictions are generated and shown in Figs. 11 and 12 for 100 and 500 number of users.

Table 1 Predicted output and real test output of various ML algorithms for 100 number of users (bold items show the highest values compared to remaining)
Table 2 Predicted output and real test output of various ML algorithms for 500 number of users
Fig. 11
figure 11

Confusion matrix of various ML algorithms for 100 number of users (a) Logistic Regression (b) KNN Algorithm (c) SVM Algorithm (d) Naïve Bayes Classifier (e) Decision Tree Classification (f) Random Forest Algorithm

Fig. 12
figure 12

Confusion matrix of various ML algorithms for 500 number of users (a) Logistic Regression (b) KNN Algorithm (c) SVM Algorithm (d) Naïve Bayes Classifier (e) Decision Tree Classification (f) Random Forest Algorithm

The authors of this study analyzed the performance of the ML algorithms on various parameters such as Accuracy, Precision, Sensitivity, Specificity, F1_score, Confusion Matrix by varying the number of users and presented in Tables 3 and 4. The accuracy of a model is measured by how often it is correct. The precision measure is used to evaluate the amount of positive percentage. On the other hand, sensitivity is a measure of how good a model is at predicting false negatives. This is because, if a model is correct about predicting a positive outcome, then it should also consider the true positives. The sensitivity measure is useful in assessing the accuracy of a model when it comes to predicting a positive outcome. Specificity, on the other hand, is a measure of how well a model can predict a negative outcome. It takes into account both the false and positive cases. The F-score is a harmonic measure that takes into account both the sensitivity and precision of a model. However, it does not take into account the True Negative values. It has been observed from the Tables 3 and 4 that the number of test users increases with the number of values, which leads to more errors. It is also believed that the system will become more efficient by having fewer test users and more trained ones.

Table 3 Performance analysis of various ML algorithms for 100 users (bold items show the highest values compared to remaining)
Table 4 Performance analysis of various ML algorithms for 500 users (bold items show the highest values compared to remaining)

5.1 Comparison with the Literature

The Table 5 shows the various performance improvements that the proposed system has made compared to the literature [30,31,32]. In terms of precision, sensitivity, specificity, and F1_score, the improvements are evaluated according to the ML algorithm. In a study, Purab Nandi et al., [30] analyzed the performance of several popular ML algorithms to predict the right values for users. Compared to Ref. [30], the proposed system performed significantly better in terms of sensitivity, specificity, and accuracy. In addition, it exhibited a remarkable increase in enhancement values of 10.1%, 25%, and 5.49%, respectively. In a study conducted on a cognitive radio network, Geetanjali et al., [31] presented a novel attack that can be used by an intruder to access the spectrum. The proposed system’s performance is validated by means of a weighted error probability of 80%. The proposed system’s improvement of 15% is compared with the previous model’s accuracy of 92%. In order to ensure that the radio waves are distributed across the spectrum, Wajhal Gaurav et al., [32] proposed a machine learning-based prediction method. The performance of this technique is compared with our proposed system and it is observed a good improvement in terms of accuracy, precision and F1_score with a maximum increment of 7.79%, 4.8% and 5.6%, respectively. In some cases, the proposed system performed poorly when compared to the literature. But, in the overall sense, the system has significantly improved, which shows the proof of our findings.

Table 5 Performance comparison of proposed system with the existing literature

6 Conclusions

The authors of this paper presented a framework that aims to improve the performance of a system by implementing various machine learning techniques for spectrum handoff. These include the Logistic Regression, KNN Algorithm, SVM Algorithm, Naïve Bayes Classifier, Decision Tree Classification and Random Forest Algorithm. The system is implemented on a live dataset that contains all the users in the power domain using the non orthogonal multiple access (NOMA) technique. The data collected from this system is then analyzed using a software-defined radio experimental setup. The performance of different ML techniques is compared with the accuracy, sensitivity, specificity, and F1_score and confusion matrix of the users. The number of test users has been observed to increase with the number of values. The increasing number of test users can lead to errors and lead to the system becoming more inefficient. This is because the number of trained users and fewer test users will help the system become more efficient. It is also observed that proposed system has shown some poor performance compared to literature in some case, however, in overall our system has shown a significant improvement which shows evidence of our findings.