Abstract
Now-a-days, large amount of data is being generated at various organizations. In many organizations, there is an inefficiency of handling Big Data with higher volumes, velocity, and variety. Though data is a huge resource, organizing Big Data is a huge challenge in present days. Currently, number of companies adopted different types of NoSQL databases like Cassandra, MongoDB, HBase, etc., which can handle number of requests at a time. To process the Big Data, Apache Spark, one of the most powerful processing engines, has a number of benefits. The main programming notion in Apache Spark is Resilient Distributed Datasets (RDDs), which handles only procedural processing. However, the most regular data processing paradigms are relational queries which cannot be handled by RDD. To overcome this, there is a need to use several higher-level libraries on Apache Spark. Spark SQL is one of the novel components in Apache Spark Framework that integrates relational processing through Apache Spark’s functional programming API. It allows Apache Spark programmers to use the benefits of relational processing. It also provides an integration of relational processing and procedural processing using a declarative Data Frame API. Hence, in this study, Spark SQL Data Frames are experimented to enhance the processing of weather data stored in Cassandra database. Further, the study has proved that the Spark SQL Data Frames have outperformed performance than Spark Core RDD which we have experimented earlier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
https://www.sas.com/en_in/insights/analytics/big-data-analytics.html
https://insidebigdata.com/2015/11/30/an-overview-of-ApacheSpark-sql/
https://dzone.com/articles/analytics-with-apache-spark-tutorial-part-2-ApacheSpark
Apache Cassandra [Online]. Available: https://www.datastax.com/wp-content/uploads/2012/09/WPDataStax-HDFSvsCFS.pdf
Anusha, K., UshaRani, K.: Big data techniques for efficient storage and processing of weather data. Int. J. Res. Appl. Sci. Eng. Technol. (IJRASET) 5(VII) 2017. ISSN: 2321-9653
https://www.toptal.com/ApacheSpark/introduction-to-apache-spark
Bhutkar, B.: Data Management using Apache Cassandra. SAS Research and Development (India) Pvt. Ltd
Xin, R., Zaharia, M.: Lessons from running large-scale Spark workloads. http://tinyurl.com/largescale-spark
Bhattacharya, A., Bhatnagar, S.: Big data and apache spark: a review. Int. J. Eng. Res. Sci. 2(5), 206–210 (2016)
https://es.coursera.org/lecture/scala-spark-big-data/spark-sql-NlNqx
https://data-flair.training/blogs/apache-spark-rdd-vs-dataframe-vs-dataset/
https://www.simplilearn.com/running-sql-queries-using-spark-sql-tutorial-video
NCDC weather data [online]. Available: https://www.ncdc.noaa.gov/orders/qclcd/
https://www.slideshare.net/databricks/large-scaleApacheSparktalk
Anusha, K., Usha Rani, K., Lakshmi, C.: A survey on big data techniques. Special Issue on Computational Science, Mathematics and Biology IJCSME- SCSMB-16March-2016, ISSN-23498439
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Anusha, K., Usha Rani, K. (2020). Performance Evaluation of Spark SQL for Batch Processing. In: Venkata Krishna, P., Obaidat, M. (eds) Emerging Research in Data Engineering Systems and Computer Communications. Advances in Intelligent Systems and Computing, vol 1054. Springer, Singapore. https://doi.org/10.1007/978-981-15-0135-7_13
Download citation
DOI: https://doi.org/10.1007/978-981-15-0135-7_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0134-0
Online ISBN: 978-981-15-0135-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)