1 Introduction

Recent trends in storage application development show the necessity for researchers to produce breakthroughs in terms of energy consumption in distributed storage systems [1,2,3,4,5,6,7]. In distributed storage applications, a redundant array of inexpensive disks (RAID) has high energy consumption for striping data and parity blocks into storage nodes[1, 3,4,5,6]. RAID empowers the storage application to resist node failures. First, RAID splits data chunks into data blocks at a file-system level. Then, RAID generates redundant data blocks from the total number of data blocks. Finally, RAID performs simultaneous physical writes at a kernel level. In Table 1, the specifications of the proposed storage application are compared to those of existing storage applications in terms of applied erasure codes, storage network, disk array, and type of I/O level and process for dynamic power management (DPM). A DPM scheduler enables efficient usage of storage devices by switching the power mode to a lower power-mode while a storage system does not perform any I/O operation. In the works of Son et al. [9,10,11], Zhu et al. [12], Pirahandeh et al. [13], and Yin et al. [14], DPM schedulers were applied for single logical I/O operation or multiple logical I/O operations (workload), whereas the proposed DPM scheduler is applied for physical I/O operations. Irani et al. [1] described two types of DPM: off-line and on-line. The off-line DPM knows the length of the idle period of the storage nodes in advance, whereas the online DPM does not. In terms of the DPM process type, the DPM schedulers proposed by Son et al. [9,10,11], Yin et al. [14], and Pirahandeh et al. [13] used on-line processes, and the DPM scheduler proposed by Zhu et al. [12] used an on-line or off-line process, whereas the proposed DPM scheduler uses both on-line and off-line processes. This hybrid storage application is constructed through a combination of SSDs and HDDs in a node array. In the proposed distributed storage application, the encoder pipelines the physical data blocks using an on-line DPM process to exploit the high transfer rate, while parity blocks will be pipelined later using an off-line DPM process. This will enable the proposed distributed storage application to perform separate DPM method for data nodes using SSDs and parity nodes using HDDs.

Table 1 Comparison of the developed scheduling methods for distributed storage applications

In this paper, we propose power mode scheduling to reduce energy consumption for distributed storage applications. The platform of the proposed distributed storage application consists of an initiator server, a target storage server, and iSCSI high-speed storage networks. The proposed application generates parity by using the CPU and it stripes parity and data blocks at the target server. An energy-aware encoding scheduler allows storage applications to switch the power mode of the storage nodes from active to idle or standby while storage applications are not performing any I/O operations. As a result, the average encoding energy consumption of the proposed storage application (SA3) is lower that of SA1, SA2, SA4, and SA5 traditional storage systems, respectively. Moreover, the proposed storage application (SA3) have a higher encoding and decoding performance that of SA1, SA2, SA4, and SA5 traditional storage systems, respectively. Section 2 presents the background and motivation. Section 3 describes the proposed techniques in detail. The experimental environment and results are described in Sect. 4. Section 5 reports the conclusions.

Table 2 The pseudo-code of the parity generator using Cauchy RS codes [17]

2 Background and motivation

Distributed RAID storage applications designed to ensure reliability generate coding data nodes called “parity nodes” using various erasure coding schemes to provide node fault tolerance. There are various types of erasure codes, such as Reed–Solomon (RS) codes and single parity check (SPC) codes, defined by their coding scheme. In distributed RAID storage applications, the authors assume that the workload D consists k data chunks, \(D= \{D_0,\ldots , D_{k-1}\}\). The ith data chunk consists of nw data blocks, \(D_i=\{d_0,\ldots ,d_{nw-1}\}\), and the ith parity chunk consists of parity blocks \(P_i = \{p_0,\ldots ,p_{mw-1}\}\), where mw denotes the number of stripes. Table 2 shows the pseudo-code of the parity generator using Cauchy RS codes.

In erasure codes, we assume that \((Data|Coding)=(D|P)= D \times G = D \times (I | X)\), where the generator matrix \(G = (I | X)\) consists of the identity matrix I with \(wn \times wn\) code words and coding matrix X = {\(x_{0},\ldots ,x_{nw-1},\ldots ,x_{mw-1,nw-1}\)} with \(m \times n\) coding blocks. XOR operations are executed between data code words in a stripe. In storage applications using CPU cores (SA1, SA2, SA4, and SA5), the sequences of XOR operations for generating a parity code word \(P_i\) at the ith stripe as follows:

$$\begin{aligned} P_i= \sum _{j=0}^{nw-1} P_i \oplus (d_j \times x_{i,j})=(d_0 \times x_{i,0})\oplus ,\ldots ,\oplus \left( d_{nw-1}\times x_{i,nw-1}\right) . \end{aligned}$$
(1)

2.1 Traditional energy-aware I/O scheduling method in distributed storage applications

Figure 1 shows a sequence diagram of the traditional energy-aware I/O scheduling method for distributed storage applications [15, 16] in traditional RAID 5 storage applications using two slave data nodes (two SSDs) and one slave parity node (one HDD). The logical write operation delay in the traditional RAID level 5, LW, is as follows:

$$\begin{aligned} LW_{Online|Off{\text{-}}line} =&3 \times Spin up + 2 \times Ch+6 \times ABC+4\times ABR \nonumber \\&+2\times XOR +6 \times SR+6\times PW + 3 \times Spin down. \end{aligned}$$
(2)
Table 3 Notation list

As noted in Table 3, SpinupChABCABRXORSRPW, and Spindown denote the delay due to switching the disk power mode from standby/idle to active, splitting data to chunks, allocating a block to main memory, allocating a block from/to the register, performing the XOR operation using a parity computing scheduler, reading a block from the main memory by a system kernel, physically writing a block to the disk using a RAID device driver , and switching the disk power mode from active to standby/idle, respectively. The LW latency can be defined as the elapsed time (\(t_{wc}\)) between the command of an iSCSI initiator and the response to completion of the command because the next command can be transmitted after the write completion (WC) response of a target storage server. Finally, a logical write can be processed on-line or off-line regardless of the type of storage device in node array.

Fig. 1
figure 1

DPM-based logical write scheduling method in a traditional distributed storage application (\(k= 2, n= 2, m= 1, w= 2\))

3 Proposed energy-aware distributed storage application

The proposed storage application(SA3) is composed of an initiator server, a target storage server, and iSCSI high-speed storage networks. The aim is to reduce energy consumption for the distributed storage system. At the initiator server, a data chunking module assorts a file into multiple data chunks, and these data chunks are transferred to the target server using the iSCSI protocol via the storage network. At the target storage server, a data chunk is assorted into multiple data blocks which are loaded into the CPU main memory. The parity generator creates parity blocks using a parity compute function and a memory allocator function. By allocating the parity generator in the target server, we can fairly compare the I/O performance of the proposed storage application with that of traditional storage applications. This is because, in all traditional storage applications (SA1, SA2, SA4, SA5), a parity generator is allocated in the same storage server. The proposed energy-aware scheduler switches the power modes of the data disks using a power-mode-switching function and a disk (SSD and HDD) energy profiler. The energy-aware scheduler module is used to reduce power consumption by switching SSD power modes between active and idle modes or by switching the HDD power mode between active and standby modes. Finally, an I/O redirector uses an energy-aware scheduler scheme to write data and parity blocks to hybrid storage.

Fig. 2
figure 2

Sequence diagram of the DPM-based logical write scheduling method in the proposed storage application (\(k= 2, n= 2, m= 1, w= 2\))

3.1 Energy-aware write scheduler at the initiator and the target server

Figure  2 shows the proposed sequence diagram of the DPM-based logical write RAID 5 scheduling method in the proposed storage application(SA3) using two data nodes (two SSDs) and one parity node (one HDD). The logical write operation delay in the proposed distributed application, LW, is defined as:

$$\begin{aligned} LW_{On{\text{-}}line}= 2\times Spin up+ 2\times Ch+ 4\times ABC \nonumber \\ + 4\times SR+ 4\times PW + 2 \times Spin down. \end{aligned}$$
(3)
$$\begin{aligned} LW_{Off{\text{-}}line}=Spin up+ 4\times ABR+ 2\times XOR+ 2\times ABC \quad + 2\times SR+ 2\times PW + Spin down. \end{aligned}$$
(4)

\(LW_{On{\text{-}}line}\) latency can be defined as the elapsed time (\(t_{DWC}\)) between a command of the iSCSI initiator and the data completion response. \(LW_{Off{\text{-}}line}\) latency can be defined as the elapsed time (\(t_{PWC}-t_{DWC}\)) between a command of a data completion response and the parity completion response, because the next command can be transmitted after the data write completion (DWC) response of a target storage server.

Table 4 Distributed RAID decoder

3.2 Energy-aware distributed RAID decoder

Table 4 shows the distributed RAID decoder, which can recover m disk failures. In lines (1–3), the distributed RAID decoder spins up data disks located in data nodes, reads a data chunk from n data disks, and finally spins down the data disks. In line (4), when the number of data disk failures \(n-\Phi (G)\) exceed m, it is a disaster failure that cannot be recovered from. In lines (5–13), when \(n-\Phi (G)\) disk failures occur, the algorithm recovers up to mw failed data code words. In line (6), the distributed RAID decoder spins up parity disks located in parity nodes. In line (7), the distributed RAID decoder allocates the corresponding survival data and parity code words to ds. In line (8), the distributed RAID decoder spins down parity disks located in the parity nodes. The RAID decoder loads the inverted coding code words \(B^{-1}\), where columns corresponding to failed data code words are deleted in lines (9–10). By using the parity computation algorithm, the distributed RAID decoder recovers the failed data code words by performing matrix multiplication between inverted coding code words \(B^{-1}\) and survival data and parity code words ds in line (11). The survival data code words and recovered data code words \(d'\) are merged to recover d[nw] in line (12). Finally, recovered data chunks d[nw] are read from main memory into the host in line (13).

3.3 Evaluation of energy consumption

In RAID storage systems, data should be archived along with parity data across many storage devices, so that if a storage device fails, here is still sufficient data to repair the disks [2]. It is important to develop an energy-saving storage system based on the RAID structure. Irani et al. [1] elucidated that disk power management aims to save energy by switching disks to lower- power modes whenever possible without adversely affecting performance. As soon as I/O operations are completed, DPM decides if the disk should stay in the idle mode. However, the idle period needs to justify the cost of spinning the disk up and down. If the encoding time is greater than the time needed to spinn the disk up or down, then it is more beneficial to spin down the disk to a lower-power standby mode. Therefore, the disk is spun down immediately after the current request is serviced and spun up to an active mode just in time for the next request. Otherwise it is better to stay in the idle mode after the current request completes. Irani et al. [1] described that the DPM strategy must make decisions with only partial information. An online power management method must make decisions about the expenditure of resources before all the input to the system is available. Therefore, an online DPM does not know the length of an idle period until the moment that it ends. However, an off-line DPM knows the length of the idle period in advance.

Disk power management We proposed a new DPM for encoding and decoding physical data blocks. The proposed DPM enables on-line DPM processing for data SSDs and off-line DPM processing for parity HDDs. The total energy cost of CPU-based encoding, E(t), is calculated based on:

$$\begin{aligned} e_{coding}=e_{CPU cycle} \times N_{core}. \end{aligned}$$
(5)
$$\begin{aligned} E(t)=&\sum \limits _{i=1}^{m+n} e_{active{\text{-}}Disk_{i}}(t_{active{\text{-}}Disk_{i}}-T_{active{\text{-}}Disk_{i}})+ C_{active{\text{-}}Disk_{i}} \nonumber \\&+ e_{idle|standby{\text{-}}Disk_{i}} t_{idle|standby{\text{-}}Disk_{i}} + e_{coding} t_{coding}. \end{aligned}$$
(6)
$$\begin{aligned} Thr=\frac{D_{size}}{t_{SSD{\text{-}}active}+t_{HDD{\text{-}}active}+t_{coding}+t_{HDD{\text{-}}standby}+t_{SSD{\text{-}}idle}}. \end{aligned}$$
(7)

where \(t_{idle|standby}\), \(t_{coding}\) and \(t_{active}\) are denoted as the idle/standby time, the time to generate parity, and read/write time, respectively. The power consumptions for coding, active, and idle/standby modes are denoted as \(e_{coding}\), \(e_{active}\), and \(e_{idle|standby}\), respectively, and the power consumption for each CPU cycle is denoted as \(e_{CPU cycle}\). The numbers of data disks, coding disks, and CPU cores are denoted as n, m, and \(N_{core}\), respectively.

Fig. 3
figure 3

Examples of three RAID 5 logical write (encoding) power management models. a The proposed model, b the naive model, and c Pirahandeh et al.’s model (\(k=2\), \(w=2\), \(n=2\), m=1)

Fig. 4
figure 4

Examples of three RAID logical degraded read (decoding) power management models when SSD 2 has failed a the proposed model, b the naive model, and c Pirahandeh et al.’s model (\(k=2\), \(w=2\), \(n=2\), m=1)

In addition, \(T_{active}\) is the time required to spin down a disk from active to idle/standby mode and \(C_{active}\) is the energy required to spin up a disk from idle/standby to active mode. The throughput of a storage system Thr is a ratio of the data size \(D_{size}\) to the encoding time at the target server, as shown in Eq. (7). Therefore, the encoding time is calculated from a summation of the idle time, the standby time, the coding time and the active time.

Energy-aware model analysis Three RAID power management models are used to measure the energy flow, as shown in Figs. 3 and  4. The SSDs have the power modes ONOFFactive and idle, whereas the HDDs have the power modes sleep (the drive is shut down), standby, (a low-power mode where the drive is spun down,) and active (normal operations). The RAID power-management equipments enables power mode switching when the encoder or decoder reads data code words from storage devices, \(t_{read}\), parity is generated using a CPU, \(t_{PG}\); and code words are written into the storage devices, \(t_{write}\).

  1. (a)

    Figures  3a and 4a show the energy flow of the encoding and decoding processes, respectively, using the proposed RAID power management model (SA3). To reduce energy consumption, the proposed model switches disk power from the idle/standby mode to an active mode (spin up) before each read/write operation occurs. The proposed model switches disk power from the active mode to the idle/standby mode (spin down) after each read/write operation is done.

  2. (b)

    Figures  3b and 4b show the naive RAID power management models proposed by Son et al. [9,10,11] (SA1) and Zhu et al. [12] (SA2), respectively. The naive model is designed based on the the CPU-based parity generation algorithm. To reduce energy consumption, the naive model switches disk power from the idle/standby mode to the active mode (spin up) before the encoding or decoding process is started. The naive model switches disk power from the active mode to the idle/standby mode (spin down) after the encoding or decoding process is completed.

  3. (c)

    Figures  3c and 4c show the RAID power management model proposed by Pirahandeh et al. [13]. This model is designed from the the CPU-based parity generation algorithm. To reduce energy consumption, Pirahandeh et al.’s model switches SSD power from the idle mode to the active mode (spin up) before the encoding process is started. This model switches SSD power from the active mode to the idle mode (spin down) after the encoding process is completed. HDD power will be in the active mode only when the encoder perform a parity write operation. During the decoding process, when SSD 2 fails, survival SSD power will be in the active mode, and when failed data is recovered, the failed SSD will be in the active mode. However, this model switches the HDD power to the active mode when the decoder recovers failed data by accessing survival SSD and parity HDD.

The energy-aware scheduling system consists of a disk energy profiler and power-mode-switching functions. The disk energy profiler initializes the power mode profile of the storage devices. The power-mode-switching function will also switch storage device power to the idle/standby power mode based the proposed RAID power management model, as soon as the I/O operation is completed.

Fig. 5
figure 5

Specifications of the experimental environment

4 Experimental results

The proposed architecture of the energy-aware RAID storage system is implemented. Figure  5a–c show the target/initiator server and the workload specifications. Figure  5d shows the SSD and HDD specifications for the hybrid storage application.

Fig. 6
figure 6

One-disk failure recovery (decoding) performance of the SPC, RS, and EVENODD erasure codes for the given chunk sizes and storage applications

Fig. 7
figure 7

Energy consumption of the SPC, RS, and EVENODD erasure codes for the given chunk sizes and storage applications

4.1 Energy-aware degraded read performance

Figure  6 shows the decoding performance of various storage applications (SA1, SA2, SA3, SA4 , and SA5) using the SPC, RS, and EVENODD erasure codes for given chunk sizes (4 KB, 16 KB, 64 KB, 256 KB, and 1MB). The average throughput of the one-disk failure recovery using the proposed storage application (SA3) is improved by 67%, 63%, 27%, 49%, 41%, and 18% compared to that of SA1, SA2, SA4, and SA5, respectively. To be more specific, the average decoding performance of the storage applications using the EVENODD code is lightly lower than the performance of the storage applications using RS and SPC code because the encoding performance of the storage applications using EVENODD erasure code has more one bit code words in the coding code words, which affects the number of required XOR operations.

4.2 Energy consumption performance

Figure  7 and Table 5 show the encoding energy consumption of various storage applications (SA1, SA2, SA3, SA4 and SA5) using the SPC, RS, and EVENODD erasure codes for given chunk sizes (4 KB, 16 KB, 64 KB, 256 KB, and 1MB). Equations  (5) and (6) are used to measure the encoding energy consumption in these storage applications. The average encoding energy consumption of the proposed storage application (SA3) is improved by 36%, 28%, 26%, and 27% compared to that of SA1, SA2, SA4, and SA5, respectively. In SA3, \(t_{coding}\) decreases in proportion as the number of disks for coding increases. This is because \(t_{active}\) and \(t_{coding}\) under SA3 decrease more sharply than others as chunk size increases. Table 5 shows the specifications of the experimental DPM method for the proposed storage application compared to those of the traditional storage applications.

Table 5 Specifications of the experimental DPM method for the proposed storage application compared to those of traditional storage applications

5 Conclusion

The proposed system provided energy-aware scheduling for distributed RAID storage applications at the target server with higher I/O performance. The proposed distributed RAID scheduling method differs from existing RAID methods in that it reduces the energy consumption by stripping and switching the power modes of data and parity blocks in storage devices in a separated manner. The proposed power management model is applied to both SSD-based data storage and HDD-based parity storage. Experimental results from the proposed storage application (SA3) exhibit decoding performance with throughput that is 67%, 63%, 27%, and 41% higher than that of SA1, SA2, SA4, and SA5, respectively. The proposed storage exhibits 89%, 75%, 71% and 65% faster parity computation than SA1, SA2, SA4, and SA5, respectively.