Abstract
By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterogeneity of storage nodes on the performance of active storage systems. We introduce CADP, a capability-aware data placement scheme for heterogeneous active storage systems to obtain high-performance data processing. The basic idea of CADP is to place data on storage nodes based on their computing capability and storage capability, so that the load-imbalance among heterogeneous servers can be avoided. We have implemented CADP under a parallel I/O system. The experimental results show that the proposed capability-aware data placement scheme can improve the active storage system performance significantly.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Chen C, Chen Y. Dynamic active storage for high performance I/O [C] // Proceedings of the 41st International Conference on Parallel Processing. Washington D C: IEEE Press, 2012: 379–388.
Kandemir M, Son S W, Karakoy M. Improving I/O performance of applications through compiler-directed code restructuring [C] // Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST’ 08. San Jose: USENIX Association, 2008: 159–174.
Son S W, Lang S, Carns P, et al. Enabling active storage on parallel I/O software stacks [C]// Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST. Washington D C: IEEE Press, 2010: 1–12.
Ma X N, Reddy A L N. MVSS: An active storage architecture [J]. IEEE Transactions on Parallel and Distributed Systems, 2003, 14(10): 993–1005.
Acharya A, Uysal M, Saltz J. Active disks: Programming model, algorithms and evaluation [J]. ACM SIGPLAN Notices, 1998, 33(11): 81–91.
Riedel E, Gibson G A, Faloutsos C. Active storage for large-scale data mining and multimedia [C] // Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB. New York: Morgan Kaufmann Press, 1998: 62–73.
Tang H, Gulbeden A, Zhou J, et al. The panasas active scale storage cluster-delivering scalable high bandwidth storage [C] // Proceedings of the ACM/IEEE SC2004 Conference on Supercomputing. Washington D C: IEEE Press, 2004:53–62.
He S B, Xu X B, Yang Y H. Oasa: An active storage architecture for object-based storage system [J]. International Journal of Computational Intelligence Systems, 2012, 5(6): 1173–1183.
Xie Y, Muniswamy-Reddy K, Feng D, et al. Design and evaluation of Oasis: An active storage framework based on T10 OSD standard [C] // Proceedings of the IEEE 27th Mass Storage Systems and Technologies, MSST. Washington D C: IEEE Press, 2011: 1–12.
Tiwari D, Boboila S, Vazhkudai S S, et al. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines [C] // Proceedings of the 11th USENIX Conference on File and Storage Technologies, FAST’13. San Jose: USENIX Association, 2013:119–132.
Rich B, Thain D. Datalab: Transactional data-parallel computing on an active storage cloud [C] // Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC’08. Boston: Association for Computing Machinery Press, 2008: 233–234.
Cortes T, Labarta J. Taking advantage of heterogeneity in disk arrays [J]. Journal of Parallel and Distributed Computing, 2003, 63(4): 448–464.
Keeton K, Patterson D A, Hellerstein J M. A case for intelligent disks (IDISKs) [J]. ACM SIGMOD Record, 1998, 27(3): 42–52.
Su W Y S, Lipovski G J. Cassm: A cellular system for very large data bases [C] // Proceedings of the International Conference on Very Large Data Bases, VLDB. Framingham: Association for Computing Machinery, 1975: 456–472.
Ozkarahan A E, Schuster S A, Smith K C. Rap: An associative processor for data base management [C]// Proceedings of the AFIPS Joint Computer Conferences. Washington D C: IEEE Press 1975: 379–387.
Chiu S, Liao W K, Choudhary A. Design and evaluation of distributed smart disk architecture for I/O-intensive workloads [C] //Proceedings of International Conference on Computational Science, ICCS’03. Berlin: Springer-Verlag, 2003: 230–241.
Franklin M, Chamberlain R, Henrichs M, et al. An architec-ture for fast processing of large unstructured data sets [C]// Proceedings of the IEEE International Conference on Computer Design, ICCD’04. San Jose: Institute of Electrical and Electronics Engineers, 2004: 280–287.
Sivathanu M, Arpaci-Dusseau A C, Arpaci-Dusseau R H. Evolving RPC for active storage [J]. ACM SIGPLAN Notices, 2002, 37(10): 264–276.
Piernas J, Nieplocha J, Felix E J. Evaluation of active storage strategies for the lustre parallel file system [C] // Proceedings of the 2007 ACM/IEEE Conference on Supercomputting, SC’07. New York: Association for Computing Machinery, 2007: 1–10.
Chen C, Chen Y, Roth P C. Dosas: Mitigating the resource contention in active storage systems [C] // Proceedings of the IEEE International Conference on Cluster Computing, CLUSTER’12. Washington D C: IEEE Press, 2012: 164–172.
Huston L, Sukthankar R, Wickremesinghe R, et al. Diamond: A storage architecture for early discard in interactive search [C] // Proceedings of the 3rd USENIX Conference on File and Storage Technologies, FAST’04. San Francisco: USENIX Association, 2004: 73–86.
Weber R O. Information technology—SCSI object-based storage device commands-2(osd-2), Revision 5[R]. Oklahoma: INCITS Technical Committee T10/1729-D, 2009.
Qin L, Feng D. Active storage framework for object-based storage device [C] // Proceedings of the IEEE 20th International Conference on Advanced Information Networking and Applications. Washington D C: IEEE Press, 2006: 97–101.
Devulapalli A, Murugandi I, Xu D, et al. Design of an Intelligent Object-Based Storage Device [M]. New York: Springer -Verlag, 2009.
John T M, Ramani A T, Chandy J A. Active storage using object-based devices [C] // 2008 IEEE International Conference on Cluster Computing. Washington D C: IEEE Press, 2008: 472–478.
Li X Y, He S B, Xu X B. Skewed data distribution for active storage systems on hybrid servers [J]. International Journal of Grid and Distributed Computing, 2016, 9(5): 51–62.
Song H, Yin Y, Chen Y, et al. A cost-intelligent application-specific data layout scheme for parallel file systems [C] // Proceedings of the 20th International Symposium on High Performance Distributed Computing. San Jose: IEEE Press, 2011: 37–48.
Song H, Yin Y, Sun XH, et al. A segment-level adaptive data layout scheme for improved load balance in parallel file systems [C] // Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid’11. Washington D C: IEEE Press, 2011: 414–423.
Song H, Jin H, He J, et al. A server-level adaptive data layout strategy for parallel file systems [C] // Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops and Ph.D. Forum, PDPSW’12. Washington D C: IEEE Press, 2012: 2095–2103.
He S B, Sun X H, Feng B, et al. Performance-aware data placement in hybrid parallel file systems [C] // Proceedings of the 14 th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP. New York: Springer-Verlag, 2014: 563–576.
He S B, Liu Y, Sun X H. PAS: A performance and space-aware data layout scheme for hybrid parallel file systems [C] // Proceedings of the Data Intensive Scalable Computing Systems Workshop, DISK’14. Washington D C: IEEE Press, 2014: 41–48.
He S B, Sun X H, Haider A. HAS: Heterogeneity-Aware selective data layout scheme for parallel file systems on hybrid servers [C] // Proceedings of 29th IEEE International Parallel and Distributed Processing Symposium, IPDPS’15. Washington D C: IEEE Press, 2015: 613–622.
He S B, Sun X H, Wang Y, et al. A heterogeneity-Aware region-level data layout scheme for hybrid parallel file systems [C] // Proceedings of the 44th International Conference on Parallel Processing, ICPP’15. Washington D C: IEEE Press, 2015: 340–349.
He S B, Wang Y, Sun X H. Boosting parallel file system performance via heterogeneity-aware selective data layout [J]. Journal IEEE Transactions on Parallel and Distributed System, 2015, 99: 1–14.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by the National Science and Technology Foundation of China (61572377), the Natural Science Foundation of Hubei Province (2014CFB239), the Open Fund from HPCL (201512-02), the Open Fund from SKLSE (2015-A-06), and the US National Science Foundation(CNS-1162540)
Biography: LI Xiangyu, male, Ph.D. candidate, research direction: file and storage systems, high performance computing, distributed system, and computer network.
Rights and permissions
About this article
Cite this article
Li, X., He, S., Xu, X. et al. Capability-aware data placement for heterogeneous active storage systems. Wuhan Univ. J. Nat. Sci. 21, 249–256 (2016). https://doi.org/10.1007/s11859-016-1167-4
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11859-016-1167-4