Abstract
As an important service model for advanced computing, SaaS uses a defined protocol that manages services and applications. The popularity of advanced computing has reached a level that has led to the generation of large data sets, which is also called Big data. Big data is evolving with great velocity, large volumes, and great diversity. Such an amplification of data has brought into question the existing database tools in terms of their capabilities. Previously, storage and processing of data were simple tasks; however, it is now one of the biggest challenges in the industry. Experts are paying close attention to big data. Designing a system capable of storing and analyzing such data in order to extract meaningful information for decision-making is a priority. The Apache Hadoop, Spark, and NoSQL databases are some of the core technologies that are being used to solve these issues. This paper contributes to the solutions to the issues of big data storage and processing. It presents an analysis of the current technologies in the industry that could be useful in this context. Efforts have been focused on implementing a novel Trinity model, which is built using the lambda architecture with the following technologies: Hadoop, Spark, Kafka, and MongoDB.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Gazal, P. D. Kaur. A survey on big data storage strategies [C]//International Conference on Green Computing and Internet of Things (ICGCIoT), IEEE, 2015:280–284.
A. Elomari, A. Maizate, L. Hassouni. Data storage in big data context: A survey [C]//International Conference on Systems of Collaboration (SysCo), IEEE, 2016:1–4.
A. A. Tole. Big data challenges [J]. Database systems journal, 2013, 4(3): 31–40.
Y. Liu, F. Li, Y. Wang. Incentives for delay-constrained data query and feedback in mobile opportunistic crowd-sensing [J]. Sensors, 2016, 16(7): 1138.
Y. Liu, A. E. Bashar, F. Li, et al. Multi-copy data dissemination with probabilistic delay constraint in mobile opportunistic device-to-device networks [C]//IEEE 17th International Symposium on World of Wireless, Mobile and Multimedia Networks (WoWMoM), IEEE, 2016:1–9.
X. B. Chen, S. Wang, Y. Y. Dong, et al. Big data storage architecture design in cloud computing [C]//National Conference on Big Data Technology and Applications, Springer, 2015:7–14.
P. P. Srivastava, S. Goyal, A. Kumar. Analysis of various nosql database [C]//International Conference on Green Computing and Internet of Things (ICGCIoT), IEEE, 2015:539–544.
H. L. Zhang, Y. Wang, J. H. Han. Middleware design for integrating relational database and nosql based on data dictionary [C]//International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE), IEEE, 2011:1469–1472.
R. Ranjan. Streaming big data processing in datacenter clouds [J]. IEEE cloud computing, 2014, 1(1): 78–83.
K. Grolinger, W. A. Higashino, A. Tiwari, et al. Data management in cloud environments: Nosql and newsql data stores [J]. Journal of cloud computing: advances, systems and applications, 2013, 2(1): 22.
Mongodb bringing online big data to business intelligence and analytics [EB/OL]. https://www. mongodb.com/collateral/mongodb-bringing-online-big-data-to-bi-and-analytics, 2017.
H. H. Shahraki, T. J. Gandomani, M. Z. Nafchi. A novel method for evaluation of nosql databases: A case study of cassandra and redis [J]. Journal of theoretical and applied information technology, 2017, 95(6): 1372–1381.
R. Kanwar, P. Trivedi, K. Singh. Nosql, a solution for distributed database management system [J]. International journal of computer applications, 2013, 67(2): 6–9.
P. C´ordova. Analysis of real time stream processing systems considering latency [R]. White paper, 2015.
Apache Kafka [EB/OL]. https://kafka.apache.org/intro, 2017.
Lambda architecture [EB/OL]. http://lambda-architecture.net/, 2017.
J. Nandimath, E. Banerjee, A. Patil, et al. Big data analysis using apache hadoop [C]//IEEE 14th International Conference on Information Reuse and Integration (IRI), IEEE, 2013:700–703.
Apache Hadoop [EB/OL]. http://hadoop.apache.org/, 2017.
M. Zaharia, R. S. Xin, P. Wendell, et al. Apache Spark: a unified engine for big data processing [J]. Communications of the ACM, 2016, 59(11): 56–65.
S. Farook, G. L. Narayana, B. T. Rao. Spark is superior to map reduce over big data [J]. International journal of computer applications, 2016, 133(1): 13–16.
J. Shi, Y. Qiu, U. F. Minhas, et al. Clash of the titans: Mapreduce vs. Spark for large scale data analytics [C]//Proceedings of the VLDB Endowment, 2015, 8(13): 2110–2121.
Spark streaming [EB/OL]. http://spark.apache.org/streaming/, 2017.
R. S. Xin, J. E. Gonzalez, M. J. Franklin, et al. Graphx: A resilient distributed graph system on Spark [C]//First International Workshop on Graph Data Management Experiences and Systems, ACM, 2013. 2.
S. Gopalani, R. Arora. Comparing Apache Spark and map reduce with performance analysis using kmeans [J]. International journal of computer applications, 2015, 113(1): 8–11.
Apache Storm [EB/OL]. http://storm.apache.org/, 2017}
R. Kumar, N. Gupta, S. Charu, et al. Manage big data through NewSQL [C]//National Conference on Innovation in Wireless Communication and Networking Technology, association with the Institution of Engineers (INDIA), 2014.
C. Curino, E. P. C. Jones, R. A. Popa, et al. Relational cloud: a database-as-a-service for the cloud [C]//Fifth Biennial Conference on Innovative Data Systems Re-search, 2011:235–240.
DB-Engines [EB/OL]. http://db-engines.com/en, 2017.
_Amazon-pricing [EB/OL]. https://aws.amazon.com/ ec2/pricing/, 2017.
Azure-pricing [EB/OL]. https://azure.microsoft.com/ en-au/pricing/, 2017.
Cloudant-pricing [EB/OL]. https://cloudant.com/ product/pricing/, 2017.
Google-pricing [EB/OL]. https://cloud.google.com/ bigtable/pricing, 2017.
E. Manogar, S. Abirami. A study on data deduplication techniques for optimized storage [C]//Sixth International Conference on Advanced Computing (ICoAC), IEEE, 2014:161–166.
V. Bhatia, A. Jangra. Setins: Storage efficiency techniques in no-sql database for cloud based design [C]//International Conference on Advances in Engineering and Technology Research (ICAETR), IEEE, 2014:1–5.
B. Y. Hou, K. Qian, L. Li, et al. Mongodb nosql injection analysis and detection [C]//IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud), IEEE, 2016:75–78.
K. Munir. Security model for cloud database as a service (DBaaS) [C]//International Conference on Cloud Technologies and Applications (CloudTech), IEEE, 2015:1–5.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Xi Zheng [corresponding author] is a lecturer/assistant professor in computer science at Macquarie University Australia. He earned a Ph.D. degree in software engineering from the University of Texas at Austin in August 2015. His current research focuses on the design and implementation of middlewares for cyber physical systems (CPS) and Internet of Things in general. His Ph.D. thesis looks into a practical way of bringing formal methods (e.g., temporal logics and automata theories) and physical models (in terms of real time simulation) into CPS runtime verification.
Min Fu is currently a data mining advisor in Alibaba group. He is also an honorary adjunct fellow in the Department of Computing, Macquarie University, Australia. He received his Ph.D. degree from the University of New South Wales, Sydney Australia. His research interests include: cloud computing, data mining, data analytics, machine learning, cyber security and software architecture.
Mohit Chugh is currently a master student in Deakin University. Before pursuing the master’s degree, he has worked at EA Games Inc, India for 1 year after gaining bachelor's degree in IT from Sharda University, Greater Noida, India. His research interest lies in database management systems.
Rights and permissions
About this article
Cite this article
Zheng, X., Fu, M. & Chugh, M. Big data storage and management in SaaS applications. J. Commun. Inf. Netw. 2, 18–29 (2017). https://doi.org/10.1007/s41650-017-0031-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41650-017-0031-9