Mathematical Model to Predict IO Performance Based on Drive Workload Parameters

Mohanta, Taranisen; Muddi, Leena; Chirumamilla, Narendra; Revuri, Aravinda Babu

doi:10.1007/978-81-322-2529-4_41

Taranisen Mohanta⁶,
Leena Muddi⁶,
Narendra Chirumamilla⁶ &
…
Aravinda Babu Revuri⁶

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 44))

907 Accesses

Abstract

Disk drive technologies have evolved rapidly over the last decade to address the needs of big data. Due to rapid growth in social media, data availability and data protection has become an essence. The availability or protection of the data ideally depends on the reliability of the disk drive. The disk drive speed and performance with minimum cost still plays a vital role as compared to other faster storage devices such as NVRAM, SSD and so forth in the current data storage industry. The disk drive performance model plays a critical role to size the application, to cater the performance based on the business needs. The proposed performance model of disk drives predict how well any application will perform on the selected disk drive based on performance indices such as response time, MBPS, IOPS etc., when the disk performs intended workload.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Operating system level data tiering using online workload characterization

Article 31 January 2015

On Demand IOPS Calculation in Cloud Environment to Ease Linux-Based Application Delivery

Summary of PAKDD CUP 2020: From Organizers’ Perspective

Keywords

1 Introduction

The rapid growth of social media in the last decade has changed the electronic data storage. The data storage essentially takes place on the disk drives. The disk drive technology also rapidly evolved to cater the need for the Big Data, Data revival, Data storing and Data Mining Purpose. The protection of the data essentially depends on the reliability of the disk drive. The disk drive speed and performance with minimum cost still plays the vital role as compared to other faster storage devices such as NVRAM, SSD and so forth in the data storage industry. The disk drive performance model plays a critical role to size the application, to cater the performance based on the business need.

This paper has made use of different mathematical models and compared them in order to predict the performance model of the disk drive based on the real time disk drive performance data. It compares the real time performance data of the disk drive with different kind of workload along with attributes and predicts the performance for user required workload based on the proposed model. The goal and use is to size the application based on the disk drive performance to meet the application performance for the business. This model can be used by the hard disk pre-sales team or the marketing teams to actually predict the IO performance of the storage systems running with different applications.

The proposed disk drive performance model predict how well any application will perform on the selected disk drive based on performance indices such as response time, MBPS, IOPS etc. when the disk performs intended workload. The experimental results and the model used in this paper to validate the efficiency or accuracy of proposed models with an error bound of 5 % using the real-time collected performance data. This paper work compares performance prediction with two different models and suggests linear polynomial method is the better model as it shows least deviation from the actual performance data.

2 Background and Related Work

The Data Storage Industry uses different storage technologies such as DAS (Direct Attached Storage), NAS (Network Attached Storage) and SAN (Storage Area Network). These data storage techniques are used in the modern datacenters which essentially use the disk drives. The performance of the disk plays a major role in order to meet the need of the users by depending on their type usage and different applications. The performance of the storage system depends on the performance of the hard disk. There are different kinds of storage system to cater the need of high performance of the application. There are storage systems such as NVRAM, SSD, FC, SCSI, and SATA et al. [1–3].

Many attempts have been made to compute or analyze the performance of the disk drives. In order to setup the storage system the performance prediction for different kind of workload plays a major role. There are several attempts made to predict the performance of different kinds of hard disk drives et al. [1–4]. Many research activities are done on the performance model of the hard disk in both analytical model as well as simulation way. But the deployment of the model is a big challenge in terms of time, expertise, complexity and the kind of resources required for the predictive model to run. Others such as authors in Ref. [2] have proposed different approaches on the hard disk performance prediction mechanism using the machine language tools such as CHART model and artificial neural network model. Similarly, other authors in [5] work proposed a different approach that is based on Adaptive Neuro Fuzzy Inference Systems.

In this situation, it is highly desirable to have a black box model for disk drive performance prediction with simple and accurate algorithm. Although there are research on different black box model for the disk drive model [6–8] has done but the efficient, simple and improved model is highly desirable. The goal is to be able to device a method to find out the performance model without any prior detail of complex design and algorithm of the disk drive l. In order to achieve the performance model of the disk drive the working storage setup with access to the disk drive is required. With the different kinds of workload inputs which potentially affect the disk drive performance has to be trained using the efficient mathematical equation the data generation system will learn the disk drive behavior, functionality etc. using the different scenarios.

2.1 Polynomial Model for Prediction

The polynomial model is considered to be the simplest one that analyzes the data in a very effective manner. Although the polynomials are of different order, but the order of the polynomial should be as low as possible. The high-order polynomials should be avoided unless they can be justified for reasons outside the data. Therefore, in order to predict the performance of the hard disk, here we consider a linear polynomial model that analyzes the data and predict the performance of the disk drive. A linear polynomial is any polynomial defined by an equation which is of the form

$$ p\left( x \right) = ax + b $$

(1)

where a, b are real numbers and $ a! = 0. $

2.2 Radical Model for Prediction

In order to analyze the data and predict the performance of the disk drives, a radical equation can also be used which is derived out of the experimental result pattern.

In order to show the superiority of the linear polynomial model, we have compared the prediction of disk performance through this model with radical equation model.

A radical equation is one that includes a radical sign, which includes square root $ \sqrt x $, cube root $ \sqrt[3]{x} $, and nth root $ \sqrt[n]{x} $. A common form of a radical equation is

$$ a = \sqrt[n]{{x^{m} }} $$

(2)

(Equivalent to $ a = x^{\frac{m}{n}} $) where m and n are integers.

3 Approach and Uniqueness

The main goal, and also the primary difference compared to other competitive approaches available for the performance prediction is to design a disk drive performance model that has no prior knowledge about the disk drive functional design and its implementation details. This will enable us to use mathematical tools to implement the methodology and, the same can be used over a wide variety of storage systems with minimal (hopefully none) additional effort to predict the performance.

3.1 Workload Representation in the Model

As it is already mentioned, our performance model uses the real performance data and validates against the proposed models for the prediction functionality. Particularly in this case we have identified the different parameters which influence the performance of the disk drive when they vary. In any disk drive based storage devices, the major components that influence the change in its performance are different modes/types of IOs performed by applications on the Host. The typical working model is as mentioned in the Fig. 1.

To come out with a best performance model, the following parameters are considered for the study.

Data access pattern (random, sequential, segmented in this case the values considered as 1, 2, and 3 respectively)
Types of IO requests (reads, writes and mix of reads and writes).
Data transfer size (KB).
Q-depth: number of processes simultaneously used to issue IO requests on to the disk drive.

The expected output performance of the disk drive that has to be predicted is

1.
IO’s per second.
2.
MB’s per second.
3.
Response time in milliseconds.

Input data and output data that has to be predicted are used as Int-input data and Out-actual output to be obtained.

4 Results and Contributions

The Sample data is collected from a test setup as shown below in Fig. 2. The setup has a Windows host server connected to a storage system using FC SAN switched infrastructure. At the backend of the storage system, a series of daisy chained disk enclosures consisting of 72 GB size disks are used to pool the disk space.

To observe the behavior and result of the different approaches, we present a set of comparison graph representations. The percentage of relative error is shown as

$$ R_{e} = \left| {T_{op} (Model) - T_{op} (Real)} \right| * 100/T_{op} (Real) $$

(3)

where,

op :: read or write,
$ T_{op} (Real) $ :: real throughput,
$ T_{op} (Model) $ :: throughput predicted by the model.

In Fig. 3, the graph represents the comparison between the linear polynomial method, the Radical method and the actual experimentation for 5K reads.

Polynomial derived from the sample data points

$$ f\left( x \right) = \frac{{\left( {b * x} \right) + a}}{(x + c)}. $$

(4)

where a = 40.58, b = 2.12 and c = 7.024 are the constants derived from the regression analysis out of the sample data.

Estimated Response time (RT) is calculated as

$$ RT = f(x) * x $$

(5)

where x is queue depth, outstanding commands in the queue.

The estimated IOPs is calculated as

$$ IOPs = 1000/f(x) $$

(6)

The Radical equation derived from the sample data points

$$ f\left( x \right) = ( a + b)/\sqrt {(1 + x)} $$

(7)

where a = 2 and b = 4.75.

Estimated Response time (RT) is calculated as

$$ RT = f(x) * x $$

(8)

where x is queue depth, outstanding commands in the queue.

The estimated IOPs is calculated as

$$ IOPs = 1000/f(x) $$

(9)

4.1 Comparison Between Actual, Radical, and Linear Polynomial Model for Different IO Rate

The graph shows that the polynomial curve is very close to the actual IOPS for a 5K read performance. Hence the application can very well predict the IOPS that can be achieved for a given input x just by applying the mathematical Linear Polynomial.

4.2 Comparison for Random Reads and Random Writes

From Fig. 4, the graphs shows the similar evidences when the IO throughout versus Response times are plotted using Liner Polynomial method for different samples of 8K reads and 8K writes. Actual graph is pretty much close to the modeled graph.

5 Conclusions

This paper confirms an innovative approach for performance modeling of disk drives using the linear polynomial model. The essential objective of this model is to design a self-managed storage system for the different kinds of application considering the disk drive performance. Another possibility of the model is to define the performance of a given storage device. If manufacturers published this model for the different storage devices as part of the specification or as a generic tool, potential buyers could trace their applications performance, and feed them in the models for different storage devices to see which one is better for them before they actually buy the storage device. Based on the results obtained by applying this linear polynomial model, it has been observed that the proposed model performs better as compare to the counterpart. The scope for future research includes advanced model to accommodate different disk sizes and vendor requirements, and come up with a performance model for the disk drive based storage arrays, considering the Array firmware overhead on the disk drive firmware.

References

Anderson, E.: Simple table based modeling of storage devices. Technical Report HPL-SSP-2001-04, HP Laboratories (2001). http://www.hpl.hp.com/SSP/papers/
Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995)
MATH Google Scholar
Wang, M., Au, K., Ailamaki, A., Brockwell, A., Faloutsos, C., Ganger, G.R.: Storage Device Performance Prediction with CART Models, pp. 588–595. MASCOTS (2004)
Google Scholar
Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11(2), 431–441 (1963)
Article MathSciNet MATH Google Scholar
Jang, J.S.R.: ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man, Cybern. 23(5/6), 665–685 (1993)
Google Scholar
Ruemmler, C., Wilkes, J.: An introduction to disk drive modeling. IEEE Comput. 27(3), 17–28 (1994)
Article Google Scholar
Schindler, J., Ganger, G.R.: Automated disk drive characterization. In: Proceedings of the Sigmetrics 2000, pp. 109–126. ACM Press (2000)
Google Scholar
Shriver, E., Merchant, A., Wilkes, J.: An analytic behavior model for disk drives with read ahead caches and request reordering. International Conference on Measurement and Modeling of Computer Systems. Madison, WI, 22–26 June 1998. Published as Perform. Eval. Rev. 26(1), 182–191. ACM, June 1998
Google Scholar
Neural Network Toolbox: http://www.mathworks.com
Taranisen, M., Srikanth, A.: A method to predict hard disk failures using SMART monitored parameters. Recent Developments in National Seminar on Devices, Circuits and Communication (NASDEC2—06), pp. 243–246, Nov 2006
Google Scholar

Download references

Author information

Authors and Affiliations

HP India Software Operations Pvt. Ltd, Bangalore, India
Taranisen Mohanta, Leena Muddi, Narendra Chirumamilla & Aravinda Babu Revuri

Authors

Taranisen Mohanta
View author publications
You can also search for this author in PubMed Google Scholar
Leena Muddi
View author publications
You can also search for this author in PubMed Google Scholar
Narendra Chirumamilla
View author publications
You can also search for this author in PubMed Google Scholar
Aravinda Babu Revuri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taranisen Mohanta .

Editor information

Editors and Affiliations

Department of Computer Science, Liverpool Hope University, Liverpool, United Kingdom
Atulya Nagar
Department of Computer Science and Engineering, National Institute of Technology Rourkela, Rourkela, India
Durga Prasad Mohapatra
Computer Science & Engineering, University of Calcutta, Kolkata, West Bengal, India
Nabendu Chaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohanta, T., Muddi, L., Chirumamilla, N., Revuri, A.B. (2016). Mathematical Model to Predict IO Performance Based on Drive Workload Parameters. In: Nagar, A., Mohapatra, D., Chaki, N. (eds) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Smart Innovation, Systems and Technologies, vol 44. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2529-4_41

Download citation

DOI: https://doi.org/10.1007/978-81-322-2529-4_41
Published: 03 September 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2528-7
Online ISBN: 978-81-322-2529-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics