1 Introduction

The demand for high data rates has become a very important factor in the last decade. Moreover, the next generations of communication systems should provide high bit rates. The old single-carrier systems are unable to handle these demands. Multicarrier modulation systems can achieve the high data rates [15]. Before the revolution of digital systems, multicarrier systems could not be implemented with high accuracy using the analog components. Now, the possible implementation of digital chips brings the possibility of fabricating multicarrier systems within reach. The most important multicarrier system is the OFDM system, which can be fabricated using some IFFT chips. Although OFDM is simple in implementation, it still suffers from a main downside. Thus, the output peak power is high compared with the average power, and this problem is well known as the peak-to-average power ratio (PAPR) [30]. High PAPR results in intermodulation distortion, which weakens the multicarrier overall performance.

Different schemes have been introduced in the literature to consider the PAPR problem. The simple approach is by clipping the large amplitudes [28]. The PAPR can also be reduced by pre-coding [22]. Other schemes have been suggested, such as partial transmit sequences (PTS) [26], selected mapping (SLM) [6], and other methods [16]. However, clipping the magnitude is a non-linear operation; thus, out-of-band and in-band distortions will be observed. Therefore, the bit error rate (BER) performance will degrade. However, in one of our previous studies, the BER degradation due to clipping was eliminated completely [34]. Coding is sufficient but at the expense of the bit rate. PTS and SLM are probabilistic techniques; therefore, no BER degradation will be noticed, but the cost is the large increment in the computational complexity of the system. Hence, there is a trade-off between the BER and the computational complexity. In this paper, the work concentrates on one of the most efficient and distortion-less techniques, the SLM method, and a novel scheme called the post IFFT-modified SLM (PI-MSLM) technique is proposed. PI-MSLM has the capability to reduce both the PAPR and the computational complexity, as explained in the next sections.

The conventional SLM (CSLM) approach must copy the data sequence \(\beta \) times; then, each copy is multiplied component-wise by a corresponding phase rotation vector (PRV). All branches are fed to \(\beta \)-IFFT blocks in parallel. The branch that shows the lowest PAPR will be adopted for transmission. Thus, the PRV has an important role in the reduction of the PAPR. It is important to mention here that the PAPR value will be reduced by increasing the number of PRVs, \(\beta \), and, consequently, the computational complexity will be increased accordingly. Thus, a lower bound of PAPR reduction can be obtained for a given level of complexity [20]. This bound depends on \(\beta \). To augment the functionality of conventional SLM (CSLM), many approaches have been suggested [1232]. The literature can be divided into two groups: the frequency domain and the time domain. In these two groups, CSLM has been modified to achieve a certain objective.

In the frequency domain, the dominant modification was applied on the PRV types. Thus, to achieve an optimal PAPR reduction gain, two conditions were used [12, 13] for the PRVs: the orthogonality and the periodicity of the PRVs set. Therefore, Hadamard sequences were the best choice as PRVs [12, 13]. Moreover, the best PRVs should have phases of 0 and \(\pi \) with equal probabilities [41]. Taher et al. [35] introduced a technique that slides a constant PRV selected from Hadamard sequences over the data sequence; the PAPR, the computational complexity, and the side information were reduced dramatically. A technique that enabled the receiver to recover the data without side information was introduced in [1]; in other words, the PRVs have the ability to make the receiver remove the side information, the cost of which was paid in terms of the degradation in the BER performance because this method is applicable to small constellation mappings. However, non-unity power PRVs were proposed [5, 25], and normalized Riemann sequences were also used [19], but such sequences will change the minimum constellation distance, which leads to a degradation in the BER performance accordingly. Moreover, binary chaotic sequences [29] were implied in a technique that generates PRVs from each other, but the PAPR performance was degraded. In addition, the receiver had a share of the developments jointly with the transmitter side, where the PRVs were designed in a way that made the receiver capable of recovering the data without side information, but the complexity of the receiver was increased in addition to the degradation of the BER performance [17]. The new PRVs used in [17], which improved the PAPR to a level higher than that of the low BER, were presented in [18], wherein more freedom in the pilot phase sequences was explored. Furthermore, other types of PRVs, such as pseudo-interferometry sequences [38], Fountain rotating vectors [33], Chu PRVs [24], and a class of perfect PRVs that reduces the PAPR [8], were examined. However, their performance is poorer than that of the CSLM technique, such that, [38] degrades the PAPR and increases the computational complexity, [33] increases the computational complexity, [24] degrades the BER performance, and [8] increases the side information and the computational complexity.

To reduce the computational complexity, further developments have been reported, where two PRVs will be generated in the frequency domain, and the other candidates will be generated in the time domain; that is, the IFFT blocks are consequently reduced [23]. In the same context, when the circularly shifted PRVs are first multiplied by the data sequences (in the frequency domain), the candidates can be symbolized as a weighted sum of the circularly shifted OFDM time-domain samples. As a result, only one IFFT block is required, but the complexity of the receiver is increased as a cost. Thus, it is clear that the key idea to reduce the complexity is by reducing the IFFT operations; therefore, more conversion matrices have been developed in [9, 10], where only one IFFT chip remains in the system, but the cost was the BER performance degradation. Then, the authors in [9, 10] improved the BER performance [11], but it was still a drawback of their technique. A further reduction in the computational complexity can be found by completely implementing SLM in the time domain. In this way, only one IFFT block may be used. An example is the method submitted in [24]. Soo et al. [21, 39] combined cyclically delayed time-domain symbols, where a \(50~\%\) reduction in the computational complexity was accomplished, with some BER performance degradation. The time-domain symbol combining (TDSC) technique is another method presented in [34], which can also be considered as a post-IFFT-modified SLM technique. TDSC reduces both the PAPR and the complexity of the system. One of the notable proposed modifications to the CSLM technique was reported in [40], wherein a time-domain selective filtering was used to obtain multiple scrambled OFDM symbols. The OFDM symbol with the lowest PAPR was selected for transmission, and the required number of IFFT blocks was maintained as only one block. This method was found to be very sensitive to the multipath channels. A notable technique was introduced in [32]. This technique generates PRV candidates in the time domain. However, its disadvantage is the degradation in the BER performance.

These findings motivated us to introduce the PI-MSLM approach, which achieves a more significant PAPR reduction compared with that of CSLM. Our proposed technique follows the same procedure as that of CSLM, which results in a significant reduction in the computational complexity.

2 Principles of OFDM and SLM

An OFDM signal of N-subcarriers modulated by N-data samples drawn randomly from a multi-level baseband modulation such as M-quadrature amplitude modulation (M-QAM) or M-phase shift keying modulation (M-PSK) can be formed as [20]

$$\begin{aligned} x_{n}=\frac{1}{\sqrt{N}}\sum ^{N-1}_{k=0}X_{k}e^{j2\pi \frac{kn}{N}} \qquad 0\le n \le N-1, \end{aligned}$$
(1)

where \(X_{0}\), \(X_{1}\ldots X_{N-1}\) are the data sequence. Due to the combination of the in-phase output samples, \(x_{n}\), a high PAPR will be created. The PAPR can be calculated using the following expression [6]:

$$\begin{aligned} \mathrm{PAR}=\frac{\mathrm {max_n}\{|x_{n}|^{2}\}}{E\{|x_{n}|^{2}\}}, \end{aligned}$$
(2)

where \(E\{\cdot \}\) is the expectation operation. We will use PAR in decibels rather than its rational value as \(10\log _{10}\)PAR. However, the signal \(x_{n}\) is a discrete and may lose some of the high PAPR events because it is sampled with a Nyquist range; therefore, it must be upsampled by at least four times the Nyquist value [37]. To monitor the PAPR performance, the complementary cumulative distribution function should be used:

$$\begin{aligned} \Pr \{\mathrm{PAR}_\mathrm{low} >\gamma \}=\left( \Pr \left\{ \mathrm{PAR} > \gamma \right\} \right) ^N, \end{aligned}$$
(3)

where \(\gamma \) must not be exceeded because it is the clipping threshold. As mentioned above, we concentrated on the SLM approach; therefore, it will be discussed in the next subsections in detail. SLM [6] creates \(\beta \) candidates of \(X_{k}\) data sequences, which are multiplied component-wise by the \(\beta \) PRVs. Then, the results are fed to the bank of the IFFT blocks (which is the CSLM operation). Mathematically, the CSLM operation can be expressed as

$$\begin{aligned} \mathbf {X} ^{u}=\mathbf v ^{u}\times \mathbf X , \end{aligned}$$
(4)

where

$$\begin{aligned} \mathbf X ^{u} = \begin{pmatrix} X_{0}^{u} \\ X_{1}^{u} \\ \vdots \\ X_{N-1}^{u} \\ \end{pmatrix}\end{aligned}$$
(5a)
$$\begin{aligned} \mathbf v ^{u} = \begin{pmatrix} v_{0}^{u} &{} 0 &{} \cdots &{} 0 \\ 0 &{} v_{1}^{u} &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} v_{N-1}^{u} \end{pmatrix}\end{aligned}$$
(5b)
$$\begin{aligned} \mathbf X = \begin{pmatrix} X_{0} \\ X_{1} \\ \vdots \\ X_{N-1} \end{pmatrix}. \end{aligned}$$
(5c)

In (5b), \(\mathbf v ^{u}\) is the \(u\)th PRV, and \(u = 1, 2\cdots \beta \). The PAPR should be determined for each branch, i.e., PRV \(\mathbf v ^{u}\). The branch with the minimum PAPR is chosen for the transmission, along with its corresponding rotating vector index \(\hat{u}\) as side information. Thus, the output time-domain OFDM symbol can be expressed as

$$\begin{aligned} \mathbf {x} ^{\hat{u}}=\mathbf W _{N}\times \lbrace \mathbf v _{\hat{u}}\times \mathbf X \rbrace , \end{aligned}$$
(6)

where \(\hat{u}\) represents the index of the PRV that produces the lowest PAPR value and \(W_{N}\) is the IFFT matrix. The amount of side information bits is given as [7]

$$\begin{aligned} \mathrm{SI}_\mathrm{CSLM}=\log _{2}\beta . \end{aligned}$$
(7)

From [7], the number of multiplication operations can be determined as

$$\begin{aligned} \fancyscript{M}_\mathrm{CSLM}=2\beta N \log _{2}N, \end{aligned}$$
(8)

whereas the number of addition operations is

$$\begin{aligned} \fancyscript{A}_\mathrm{CSLM}=4\beta N \log _{2}N. \end{aligned}$$
(9)

The advantages of our proposed technique in terms of significant reductions in PAPR and in the computational complexity and side information can be explained now. In addition, the PRV is chosen from the Hadamard matrix of length \(N/r\) because the Hadamard sequences were found to be the optimum choice [12, 13, 19].

3 Proposed Approach

CSLM has been introduced in detail in the previous section; here, we discuss the proposed method. Unlike CSLM, our method uses the time-domain data vector to reduce the PAPR and the computational complexity, and the side information will be reduced dramatically; therefore, it is called post-IFFT-modified SLM (PI-MSLM). PI-MSLM uses only one PRV [35] after the IFFT block of the original OFDM system. Moreover, the length of the PRV is a small part of the OFDM total length, \(N\), such that it is only \(w = N/r\) where \(r < N\) where \(r\) is the length control of the PRV. In other words, the proposed PRV, \(\mathbf v \), is

$$\begin{aligned} \mathbf v = \begin{pmatrix} 1 &{} 0 &{} 0 &{} \cdots &{} 0 \\ 0 &{} -1 &{} 0 &{} \cdots &{} 0 \\ 0 &{} 0 &{} 1 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} 0 \\ 0 &{} 0 &{} 0 &{} \cdots &{} -1 \end{pmatrix}. \end{aligned}$$
(10)

Thus, \(\mathbf v \) is a succession of negation signs. The length of \(\mathbf v \) is \(w = N/r\). For this reason, not all of the data vector will be modified. Figure 1 shows the principle of work of our new method.

Fig. 1
figure 1

Structure of PI-MSLM method to reduce the PAPR of the OFDM symbol

According to Fig. 2, it is clear that the vector \(\mathbf v \) will slide over the data vector, where these data are the output samples of the IFFT block. Thus, the OFDM symbol can be expressed as [36]

$$\begin{aligned} \mathbf y =\mathbf v\,+\,x \end{aligned}$$
(11)

or equivalently

$$\begin{aligned} \mathbf y =\mathrm{IDFT}\left( \mathbf V +\mathbf X \right) . \end{aligned}$$
(12)

In fact, the vector \(\mathbf V \) (or \(\mathbf v \) in time domain) is smaller than the data vector as stated previously. That is, \(\mathbf V \) should be extended to the length \(N\) by adding zeros, such that the number of non-zero elements is \(N/r\). It is obvious that the magnitude values have been changed; thus, the constellation points have been changed, and the system will require more power for transmission. To overcome this problem, the average power should be maintained at the original level, which can be achieved through power normalization. However, before we proceed, it is important to answer the following question: Why the proposed method will reduce the PAPR? To answer the question, it is important to show mathematically how the PI-MSLM scheme could reduce the PAPR. In the next subsections, a derivation based on the additive theory given in [36] will be explained. Indeed, the derivations in [14] were very useful in our job to find the mathematical model for the PI-MSLM; the only difference is that the approach in [14] is for a single candidate signal and does not need to send side information. In fact, the method that introduced in [14] achieves a constellation reshaping through time-domain transformation, while in the PI-MSLM, there will be \(\left( N/r\right) \)-candidate signals, and some bits of side information will be sent to the receiver to recover the data.

Fig. 2
figure 2

Sliding fashion for the PI-MSLM method

However, if we look at Eq. (11) one more time, then we note that the PAPR of \(\mathbf y \) has to be less than the PAPR of \(\mathbf x \), in other words [14, 36]:

$$\begin{aligned} \mathrm{PAR}_{y}<\mathrm{PAR}_x, \end{aligned}$$
(13)

or using Eq. (2),

$$\begin{aligned} \frac{\max (P_{y})}{P_{y_\mathrm{avg}}}<\frac{\max (P_x)}{P_{x_\mathrm{avg}}} \end{aligned}$$
(14)

but if

$$\begin{aligned} \max (P_{y})=\max (P_x)+P_a \end{aligned}$$
(15)

and

$$\begin{aligned} P_{y_\mathrm{avg}}=P_{x_\mathrm{avg}}+P_b \end{aligned}$$
(16)

with

$$\begin{aligned} \frac{P_a}{P_b}<\frac{\max (P_x)}{P_{x_\mathrm{avg}}}, \end{aligned}$$
(17)

then the \(\mathrm{PAPR}_{y}\) will always be less than the \(\mathrm{PAPR}_x\); thus, Eq. (13) is always holds. Hence, the operation in Eq. (11) can be written as

$$\begin{aligned} \mathbf y =\mathbf x +\mathbf c \, \rho , \end{aligned}$$
(18)

where \(\mathbf c \) is a vector of sign negation ones, \(\mathbf c =\left[ \ldots ,0, 0, +1, -1, +1, -1, \ldots \right] \), in other words, a zero value in \(\mathbf c \) stands for adding nothing (the corresponding sample will be not changed), and \(\rho \) depends on the time-domain data sequence \(\mathbf x \) such that

$$\begin{aligned} \rho _n=\frac{x_n}{\mid x_n \mid }=e^{j{\theta }_n}. \end{aligned}$$
(19)

Thus,

$$\begin{aligned} P_{y_n}&=y_n y^{*}_n \nonumber \\&=\left( x_n+c\rho _n\right) \left( x_n+c\rho _n\right) ^{*} \nonumber \\&=\mid x_n\mid ^2+\,c^2+2c\mid x_n \mid \!. \end{aligned}$$
(20)

Hence,

$$\begin{aligned} \max (P_{y})=\max (P_x)+c^2+2c\sqrt{\max (P_x)} \end{aligned}$$
(21)

for instance, assuming \(\mathbf c \) is all zeros vector; then, Eq. (21) will show \(\max (P_{y})=\max (P_x)\); in other words, nothing has been added to the signal, from which

$$\begin{aligned} P_a=c^2+2c\sqrt{\max (P_x)}. \end{aligned}$$
(22)

The same procedure can be evaluated for the average power to get

$$\begin{aligned} P_{y_\mathrm{avg}}=P_{x_\mathrm{avg}}+c^2+2c\left[ \frac{1}{N}\sum _{n=0}^{N-1}\vert x_n \vert \right] . \end{aligned}$$
(23)

Then,

$$\begin{aligned} P_b=c^2+2c\left[ \frac{1}{N}\sum _{n=0}^{N-1}\vert x_n \vert \right] . \end{aligned}$$
(24)

Substituting Eqs. (22) and (24) in Eq. (17),

$$\begin{aligned} \frac{\max (P_x)}{P_{x_\mathrm{avg}}}>\frac{c^2+2c\sqrt{\max (P_x)}}{c^2+\frac{2c}{N}\sum _{n=0}^{N-1}\vert x_n \vert } \end{aligned}$$
$$\begin{aligned} \max (P_x)\left[ c^2+\frac{2c}{N}\sum _{n=0}^{N-1}\vert x_n \vert \right] >P_{x_\mathrm{avg}}\left[ c^2+2c\sqrt{\max (P_x)} \right] \end{aligned}$$
$$\begin{aligned} c^2\left( \max \left( P_x\right) -P_{x_\mathrm{avg}} \right) +2c\left[ \frac{\max (P_x)}{N}\sum _{n=0}^{N-1}\vert x_n\vert -P_{x_\mathrm{avg}}\sqrt{\max (P_x)} \right] >0 \end{aligned}$$
$$\begin{aligned} \frac{\max (P_x)}{N}\sum _{n=0}^{N-1}\vert x_n\vert -P_{x_\mathrm{avg}}\sqrt{\max (P_x)}>0 \end{aligned}$$
$$\begin{aligned} \frac{\sqrt{\max (P_x)}}{N}\sum _{n=0}^{N-1}\vert x_n\vert >P_{x_\mathrm{avg}} \end{aligned}$$
$$\begin{aligned} \sqrt{\max (P_x)}\sum _{n=0}^{N-1}\vert x_n\vert >\sum _{n=0}^{N-1}\vert x_n\vert ^2. \end{aligned}$$
(25)

The last expression holds true always, because of that \(\max (P_x)\ge \vert x_n\vert \) for all \(n\). Thus, the proposed approach will reduce the PAPR. Up to this point, the answer for the previous question is done.

As it was shown, the operation of (11) will conduct the system to ask for more power for the transmission operation. To maintain the same power of the original OFDM symbol, a normalization factor can be defined as

$$\begin{aligned} \delta =\frac{\sqrt{N}}{\parallel \mathbf y \parallel }. \end{aligned}$$
(26)

However, PI-MSLM clearly must tell the receiver which shift index was used; thus, side information must be sent to the receiver. The next section will analyze the computational complexities, and how to determine the amount of side information.

4 Computational Complexity Analysis

As mentioned, the receiver must know the side information to be able to recover the data. Actually, there are two parameters which represent the side information, \(s\) and \( r\). The PI-MSLM proposes that \(r\) must be pre-defined at both sides (transmitter and receiver). Thus, only the shift index, \(s\), will be sent as side information, while the length control parameter \(r\) of the PRV will be available at the initialization of the operation. Hence, the amount of the side information bits can be calculated depending on the number of shifts. In our design, we assumed that it will shift \(N\) times as a maximum; therefore, the number of bits reserved as side information will be

$$\begin{aligned} \lambda _{\hbox {PI-MSLM}}&=\log _{2}\left[ \max (s)\right] \nonumber \\&=\log _{2}N, \end{aligned}$$
(27)

while for the traditional SLM scheme, the side information will represent the index of the phase rotation vector that produces the lowest PAPR as given in (7).

In the literature, an efficient phase rotation sequences were used in [19] called Riemann sequences. However, Riemann sequences were used instead of Hadamard sequences; for this reason, the required side information did not change as for the traditional SLM. Furthermore, the computational complexity also did not change as that of the CSLM. However, during the extensive computer simulations, it was found out that the number of Riemann sequences that required to outperform the CSLM must be larger than the number of PRV of the CSLM; this can be seen in [19] and in the results section of this paper. As a comparison between CSLM and Riemann-SLM (RiSLM), they both deal with the data sequence in the frequency-domain side, while the PI-MSLM needs only one PRV and deals with the data sequence in the time-domain side.

Other approaches were proposed in [8] to reduce the PAPR. However, it was explained in the introduction that a slight loss in term of PAPR reduction gain was noticed, as will be shown in the results of this paper. The side information for the suggested approaches in [8] depends on the phase rotation vector type; thus, there are three proposed schemes: proposed scheme I (PSI), proposed scheme II (PSII), and proposed scheme III (PSIII). Table 1 summarizes all the side information for the CSLM, PI-MSLM, RiSLM, PSI, PSII, and PSIII. Furthermore, Table 1 shows the numbers of additions and multiplications as well.

Table 1 Computational complexities comparison between the PI-MSLM, CSLM, and the fashions in [8, 19]

However, applying the PI-MSLM, the computational complexity will be reduced dramatically because the number of IFFT blocks is reduced to only one block, unlike conventional SLM, which requires more than one IFFT block (\(\beta \)). As a result, the number of multiplication operations is only the number of multiplications of the main IFFT block of the OFDM system itself

$$\begin{aligned} \fancyscript{M}_{\text{ PI-MSLM }}=\frac{N}{2} \log _{2}N. \end{aligned}$$
(28)

The last expression shows that the number of multiplication operations is only the number of multiplication operations in the original OFDM system without using the SLM approach, which is a large reduction gain. However, the number of addition operations is that of one IFFT block and that of the sliding operation:

$$\begin{aligned} \fancyscript{A}_{\text{ PI-MSLM }}=N\left( \frac{s}{w} +\log _{2}N\right) . \end{aligned}$$
(29)

In the last expression, the shift index, \(s\), has a maximum value equal to \(N\); therefore, to find the maximum number of addition operations, we have to substitute \(N\) in Eq. (29):

$$\begin{aligned} \fancyscript{A}_\mathrm{PI-MSLM}^{\mathrm {max}}&=N\left( \frac{s}{w}+\log _{2}N\right) \nonumber \\&=N\left( \frac{\mathrm {max}(s)}{w}+\log _{2}N \right) \nonumber \\&=N\left( \frac{N}{\frac{N}{r}}+\log _{2}N \right) \nonumber \\&=N\left( r+\log _{2}N \right) , \end{aligned}$$
(30)

that is, the number of additions operations does not depend on the number of shifts, and it depends on the length of the PRV.

Table 1 summarizes the computational complexities of the proposed method compared with the traditional selected mapping scheme and the suggested fashions in [8, 19].

5 PAPR Distribution Analysis

The analytic formulation of the CCDF for the PI-MSLM scheme can be determined using the same procedure given in [36]. The starting point will be the polar form of the signal after applying the PI-MSLM fashion, with the use of the normalization factor (see 26); then

$$\begin{aligned} y_n=\delta \left( \mid x_n\mid +c\right) e^{\left( j\theta _n\right) }, \end{aligned}$$
(31)

since the probability density function of the normalized \(x_n\) is given as [27]

$$\begin{aligned} f_{t_n}=2t e^{-t^2}, \end{aligned}$$
(32)

where

$$\begin{aligned} t_n=\frac{\vert x_n\vert }{\sqrt{P_{x_\mathrm{avg}}}} \end{aligned}$$
(33)

and for \(y_n\),

$$\begin{aligned} y_n=\delta \left( \vert x_n\vert +c\right) e^{j\theta _n}, \end{aligned}$$
(34)

since \(t_n\) are i.i.d. Rayleigh random variables; the probability density function can be expressed as

$$\begin{aligned} f_{t_{n}}(t)=2t e^{-t^2}. \end{aligned}$$
(35)

The cumulative distribution function CDF will be

$$\begin{aligned} F_{t_{n}}(t)=1-e^{-t^2}. \end{aligned}$$
(36)

The same derivations can lead to

$$\begin{aligned} F_{\acute{t}_{n}}(\acute{t})=1-e^{-\acute{t}^2}, \end{aligned}$$
(37)

where

$$\begin{aligned} \acute{t}_{n}=\frac{\vert y_n\vert }{\sqrt{P_{y_\mathrm{avg}}}}. \end{aligned}$$
(38)

Substituting (23), (26) in (38) yields

$$\begin{aligned} \acute{t}_{n}=p\,t_n+q, \end{aligned}$$
(39)

where

$$\begin{aligned} p=\frac{\sqrt{P_{x_\mathrm{avg}}}}{\sqrt{P_{y_\mathrm{avg}}}} \end{aligned}$$
(40)

and

$$\begin{aligned} q=\frac{c}{\sqrt{P_{y_\mathrm{avg}}}}. \end{aligned}$$
(41)

Thus, the probability density function of \(\acute{t}_n\) is

$$\begin{aligned} f_{\acute{t}_n}\left( t\right) =\frac{1}{\vert p\vert } f_{t_n}\left( \frac{t-q}{p}\right) . \end{aligned}$$
(42)

Then, the cumulative distribution function will be

$$\begin{aligned} F_{\acute{t}_n}\left( t\right) =1-e^{-\left( \frac{t-q}{p}\right) ^2},\qquad t>q. \end{aligned}$$
(43)

However, we are interested to find the probability of \(\acute{t}_n\) to be less than \(t\); therefore, such probability can be expressed as follows:

$$\begin{aligned} F_c(t)&=\Pr \left\{ \max \left\{ \acute{t}_n\right\} <t\right\} \nonumber \\&=\left[ 1-e^{-\left( \frac{t-q}{p} \right) ^2 }\right] ^N. \end{aligned}$$
(44)

Let \(\zeta =t^2\), and the cumulative distribution function of the PAPR will take the form

$$\begin{aligned} F_{P}\left( \zeta \right) =\left[ 1-e^{-\left( \frac{\sqrt{\zeta }-q}{p}\right) ^2 }\right] ^N,\qquad \zeta >q^2. \end{aligned}$$
(45)

Then, the CCDF can be written as

$$\begin{aligned} C_{P}\left( \zeta \right)&=1- F_{P}\left( \zeta \right) \nonumber \\&=1-\left[ 1-e^{-\left( \frac{\sqrt{\zeta \,P_{y_{avg}}}-c}{\sqrt{P_{x_\mathrm{avg}}}}\right) ^2}\right] ^N,\qquad \zeta >q^2, \end{aligned}$$
(46)

but for the PI-MSLM, there are \(N\) shifts (candidates); then, the CCDF will be

$$\begin{aligned} C_{P}\left( \zeta \right) =\left[ 1-\left[ 1-e^{-\left( \frac{\sqrt{\zeta \,P_{y_\mathrm{avg}}}-c}{\sqrt{P_{x_\mathrm{avg}}}}\right) ^2}\right] ^N\right] ^{N}, \end{aligned}$$
(47)

where the exponent of the outer bracket, \(N\) represents the number of the candidate signals

6 Results and Discussion

The proposed method, PI-MSLM, prevents the need for more than one IFFT block, which usually does not occur with CSLM. In this section, we will conduct two different simulation scenarios, which are given in Table 2. The two scenarios are identical but the PRV for the first scenario is \(w=N/8\) and the second scenario used \(w=N/4\).

Table 2 Simulation parameter settings for the two scenarios

As shown in Table 2, there are two different sizes for the IFFT block, \(N=64\) and \(N=256\). Each \(N\) size has two mapping types, \(16\)-QAM and \(16\)-PSK. Thus, there are eight subscenarios, the first \(4\)-subscenarios (ss-\(1.1\), ss-\(1.2\), ss-\(1.3\), and ss-\(1.4\)) represent the first scenario which corresponds to \(w=N/8\). The last \(4\)-subscenarios (ss-\(2.1\), ss-\(2.2\), ss-\(2.3\), and ss-\(2.4\)) represent the second scenario with \(w=N/4\), such that ss-\(1.1\) shows a comparison between the original OFDM signal (before applying any PAPR reduction scheme), PI-MSLM, CSLM, RiSLM, PSI, PSII, and PSIII as shown in Fig.  3. In Fig.  3, it is shown that the original PAPR equals to \(9.7\) dB, while \(8.7 \) dB, \( 7.72 \) dB, \( 2.4 \) dB, \( 8.85 \) dB, \( 8.73 \) dB, \( 8.5 \) dB, and \( 7.9 \) dB for the PI-MSLM, SLM, RiSLM\(_{64}\), RiSLM\(_{16}\), PSI, PSII, and PSIII, respectively, as shown in Table 3.

Fig. 3
figure 3

Comparison for the PAPR results for \(16\)-QAM modulation, and \(w=0.125\,N\) (\(N=64\))

Table 3 Simulation results for ss-[\(1.1\), \(1.2\), \(1.3\), and \(1.4\)] (\(w = 0.125N\))

Table 3 shows that the PAPR of the PI-MSLM is 8.7 dB; thus, it is reduced from 9.73 to 8.7 dB. The corresponding number of multiplication operations is 192, while the addition operations were 392 operations, and the number of reserved bits for side information was 2 bits. The conventional SLM scheme shows better performance in terms of PAPR reduction gain, where the PAPR was reduced down to 7.72 dB; however, the computational complexity is the cost paid for this gain, where the number of multiplication operations was 75 \(\%\) more than the number of the PI-MSLM scheme. Moreover, the number of addition operations also increased by 74.5\(~\%\) with respect to the operations of the PI-MSLM, and the side information did not change. The same SLM fashion can employ any type of PRVs; however, Riemann sequences were used in [19]. However, it is shown that the computational complexity seems to be the same as that of the original SLM scheme. However, Riemann sequences are not efficient as Hadamard sequences [12, 13]. Therefore, more candidate signals are required to achieve the same performance of the conventional SLM as stated in Table 3, Figs. 2, and 3 of [19]. Thus, the computational complexity will be increased as long as the side information bits. In our simulation, we have selected \(\beta = 64\) and \(\beta = 16\); for the first case \(\beta = 64\), the number of multiplication operations was the largest (\( \fancyscript{M}= 12288\)) among all other schemes along with \( \fancyscript{A} = 24576 \) which is also the largest number among the other approaches. In spite of this, the PAPR reduction gain was the largest also among all of the other schemes. On the other hand, when \(\beta = 16\), the computational complexity becomes within the normal ranges as shown in Table 3, but the PAPR reduced less than the conventional SLM; the reason was explained in the introduction section, and the reader can read more in [12, 13].

PI-MSLM was compared with PSI, PSII, and PSIII [8]. Figure 3 shows that PSI is worst than PI-MSLM by around 0.03 dB which is a negligible difference. However, the number of reserved bits for side information and \( \fancyscript{A}\) was the cost, where \( \fancyscript{A}= 1088\), i.e., \(\fancyscript{A}\) of the PSI scheme is around 64 \(\%\) more than that of the PI-MSLM. PSII shows better performance than PSI, where the PAPR reduced to 8.5 dB, 4-bits for side information, but the drawback was the increment in the computational cost. PSIII performed better than PSI and PSII in terms of the PAPR reduction gain. However, the number of bits for side information was larger than the others, 18-bits, and one more cost is that there is an increment of 66 \(\%\) in the number of addition operations as shown in Table 3.

The previous comparison was conducted when \(N = 64\). Thus, subscenario 1.2 (ss-1.2) is identical to ss-1.1 but with different IFFT sizes, where \(N\) will be \(256\). Figure 4 draws all the results for this subscenario and for more convenience, Table 3 explains the same results of Fig. 4 but with more details. However, from Fig.  4, the original PAPR is 12.33 dB , that is, there are 1.2, 1.72, 6.94, 0.81, 1.25, 1.5, and 2 dB PAPR reductions for PI-MSLM, SLM, RiSLM\(_{64}\), RiSLM\(_{16}\), PSI, PSII, and PSIII, respectively. Thus, a similar behavior was recognized as in ss-1.1. However, RiSLM with 64 candidates has the best reduction gain but at the expense of the computational complexity. It is shown in Table 3 that there is an increment of 98.4 \(\%\) in the addition operations with respect to the PI-MSLM fashion. PI-MSLM outperforms RiSLM when \(\beta = 20\) in terms of PAPR, \(\fancyscript{A}\) and \(\fancyscript{M}\) along with the side information bits, where two bits reserved for side information for the PI-MSLM and 5 bits for RiSLM for \(\beta = 20\). Accordingly, it is concluded that the traditional SLM scheme outperforms RiSLM with respect to the computational complexity; at the same time, the traditional SLM performs better than the PI-MSLM but at the expense of the computational complexity (see Table 3). On the other hand, PI-MSLM outperforms PSI in terms of the PAPR, \(\fancyscript{A}\), \(\fancyscript{M}\), and the number of bits of side information, while both PSII and PSIII outperform PI-MSLM but at the expense of the large amounts of \(\fancyscript{A}\), \(\fancyscript{M}\), and the number of bits of side information.

Fig. 4
figure 4

Comparison for the PAPR results for \(16\)-QAM modulation, and \(w=0.125\,N\) (\(N=256\))

The two previous subscenarios (ss-1.1 and ss-1.2) depict the results for two different IFFT sizes but both modulated using 16-QAM. In the next subsection, the same IFFT sizes of ss-1.1 and ss-1.2 will be repeated but for the 16-PSK family in ss-1.3 and ss-1.4.

However, it was found out that PI-MSLM has more capability to reduce the PAPR if the OFDM symbol was originally utilized the M-PSK family. Figures 5 and 6 state the results for ss-1.3 and ss-1.4, respectively. Table 3 explains the results of Figs. 5 and 6, respectively, with more details. Thus, from Fig. 5, the original PAPR is 10.37 dB, and it is reduced after applying PI-MSLM by 2.28 dB where it is better than SLM, PSI, PSII, and PSIII in terms of PAPR reduction gain. Furthermore, the computational complexity of the PI-MSLM is lower than the other schemes as shown in Table 3. RiSLM performs better but at the expense of the increased complexity. The same behavior was deduced from Fig. 6, but the difference is the IFFT size (\(N=256\), see ss-1.4). However, in Fig. 6, the original PAPR was 12.54 dB, while it was 6.79 dB after applying the PI-MSLM scheme. That is, PI-MSLM outperforms all of the other schemes in Fig.  6 except the case of the RiSLM (\(\beta = 64\)), but the numbers of addition and multiplication operations are large compared with the PI-MSLM. The first scenario has been discussed, and it is time to discuss the second scenario which includes four subscenarios: ss-2.1, ss-2.2, ss-2.3, and ss-2.4. In fact, the second scenario is identical to the first scenario but the only one difference is the PRV length, \(w\). However, the length of the PRV will be increased to be \(w=N/4\). It is shown in Tables 4, and 5 that the PAPR has been enhanced significantly. Furthermore, the computational complexity also was enhanced. However, the drawback is the degradation in the bit error rate (BER) performance. Thus, Figs. 7 and 8 show the BER performance for ss-1.1, and ss-2.1, respectively. To reduce the large number of figures, which may lead the reader to confusion, we limited the results of the second scenario on tables only. We put only two BER performance comparison figures: the first figure represents ss-1.1, and the second stands for ss-2.1. The rest of BER results are summarized in Table 6.

Fig. 5
figure 5

Comparison for the PAPR results for \(16\)-PSK modulation, and \(w=0.125\,N\) (\(N=64\))

Fig. 6
figure 6

Comparison for the PAPR results for \(16\)-PSK modulation, and \(w=0.125\,N\) (\(N=256\))

Fig. 7
figure 7

BER performance comparison for \(16\)-QAM modulation, and \(w=0.125\,N\) (\(N=64\))

Fig. 8
figure 8

BER performance comparison for \(16\)-PSK modulation, and \(w=0.25\,N\) (\(N=64\))

Table 4 Simulation results for ss-[\(2.1\), \(2.2\), \(2.3\) and \(2.4\)] (\(w = 0.25N\))
Table 5 PAPR results of scenario 2 in dB (\(w=0.25N\))
Table 6 BER performance evaluation of all cases

That is, the important notice is that if the PRV size increased from \(N/8\) up to \(N/4\), the PAPR will be enhanced but the BER performance will be degrade. In general, PI-MSLM has the disadvantage of the slight degradation in the BER. Figure 7 states that the signal needs 0.6 dB more than the original signal for \(w=N/8\), while for the larger PRV size, \(w=N/4\), the required extra power is 2.2 dB. However, Table 6 summarizes the BER performance of the PI-MSLM compared with the original signal. It is shown that PI-MSLM of PRV size \(w=N/8\) performs better for the case of large PRV (\(w=N/4\)). Thus, for ss-1.1, the SNR difference was 0.6 dB, while for its corresponding case (ss-2.1, \(w=N/4\)), the required extra SNR was 2.2 dB. For ss-1.2, the difference is 0.8 dB, while for ss-2.2, the difference is 2.1 dB. For the third subscenario, ss-1.3, the difference is 0.7 dB but for its corresponding subscenario, ss-2.3 needs 2.5 dB more power. The last case required 0.5 dB for \(w=N/8\), while its corresponding case requires 2.2 dB.

7 Conclusions

This study introduced a novel technique called PI-MSLM. The proposed technique shifts a vector of 1’s of successive negation signs of length \(N/8\) over the OFDM symbols to change the magnitude of the amplitudes. At each shift stepsize, the PAPR is calculated, and the OFDM symbol with the lowest PAPR is sent to the receiver with its corresponding shift index as side information. This study focused on reducing the computational complexity, which was reduced dramatically, and PAPR reduction compared with the CSLM. It is possible to use this method without restrictions on the type of the baseband modulation, such as \(M\)-QAM or \(M\)-PSK.