Inter-Data-Center Large-Scale Database Replication Optimization – A Workload Driven Partitioning Approach

Min, Hong; Gao, Zhen; Li, Xiao; Huang, Jie; Jin, Yi; Bourbonnais, Serge; Zheng, Miao; Fuh, Gene

doi:10.1007/978-3-319-10085-2_38

Hong Min²⁰,
Zhen Gao²²,
Xiao Li²¹,
Jie Huang²²,
Yi Jin²³,
Serge Bourbonnais²¹,
Miao Zheng²⁴ &
…
Gene Fuh²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8645))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1420 Accesses
1 Citations

Abstract

Inter-data-center asynchronous middleware replication between active-active databases has become essential for achieving continuous business availability. Near real-time replication latency is expected despite intermittent peaks in transaction volumes. Database tables are divided for replication across multiple parallel replication consistency groups; each having a maximum throughput capacity, but doing so can break transaction integrity. It is often not known which tables can be updated by a common transaction. Independent replication also requires balancing resource utilization and latency objectives. Our work provides a method to optimize replication latencies, while minimizing transaction splits among a minimum of parallel replication consistency groups. We present a two-staged approach: a log-based workload discovery and analysis and a history-based database partitioning. The experimental results from a real banking batch workload and a benchmark OLTP workload demonstrate the effectiveness of our solution even for partitioning 1000s of database tables for very large workloads.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Optimizing Inter-data-center Large-Scale Database Parallel Replication with Workload-Driven Partitioning

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

Efficient and stable quorum-based log replication and replay for modern cluster-databases

Article 09 January 2022

References

Cecchet, E., Candea, G., Ailamaki, A.: Middleware-based database replication: the gaps between theory and practice. In: SIGMOD (2008)
Google Scholar
Codd, E.F.: The relational model for database management: Version 2. Addison-Wesley (1990) ISBN 9780201141924
Google Scholar
Corbett, J.C., et al.: Spanner: Google’s globally-distributed database. In: OSDI (2012)
Google Scholar
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. VLDB (2010)
Google Scholar
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference, pp. 175–181 (1982)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co, New York (1990)
Google Scholar
Graham, R.L.: Bounds on multiprocessing anomalies and related packing algorithms. In: AFIPS Spring Joint Computing Conference, pp. 205–217 (1972)
Google Scholar
Gray, J., Helland, P., O’Neil, P.: The dangers of replication and a solution. In: SIGMOD (1996)
Google Scholar
Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20(1), 359–392 (1998)
Article MathSciNet Google Scholar
Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (1998)
Google Scholar
Kemme, B., Jiménez-Peris, R., Patiño-Martínez, M.: Database Replication. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2010)
Google Scholar
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal 49, 291–307 (1970)
Article MATH Google Scholar
Lin, Y., Kemme, B., Patiño-Martínez, M., Jiménez-Peris, R.: Middleware based data replication providing snapshot isolation. In: SIGMOD (2005)
Google Scholar
Patiño-Martinez, M., Jiménez-Peris, R., Kemme, B., Alonso, G.: MIDDLE-R: Consistent database replication at the middleware level. ACM TOCS 23(4) (2005)
Google Scholar
Pavlo, A., Curino, C., Zdonik, S.B.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD 2012 (2012)
Google Scholar
Pothen, A., Simon, H.D., Liou, K.: Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal on Matrix Analysis and Applications 11(3), 430–452 (1990)
Article MATH MathSciNet Google Scholar
Quamar, A., Kumar, K.A., Deshpande, A.: SWORD: scalable workload-aware data placement for transactional workloads. In: EDBT 2013 (2013)
Google Scholar
Serrano, D., Patino-Martinez, M., Jimenez-Peris, R., Kemme, B.: Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation. In: Proceedings of the 13th PRDC (2007)
Google Scholar
Stonebraker, M.: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1), 4–9 (1986)
Google Scholar
http://glaros.dtc.umn.edu/gkhome/views/metis
IBM Infosphere Data Replication, http://www-03.ibm.com/software/
Oracle GoldenGate, http://www.oracle.com/technetwork/middleware/goldengate/
http://www.tpc.org/tpce/

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Hong Min
IBM Silicon Valley Lab, San Jose, CA, USA
Xiao Li & Serge Bourbonnais
School of Software Engineering, Tongji University, Shanghai, China
Zhen Gao & Jie Huang
Pivotal Inc. Beijing, China
Yi Jin
IBM System and Technology Group, USA
Miao Zheng & Gene Fuh

Authors

Hong Min
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Jin
View author publications
You can also search for this author in PubMed Google Scholar
Serge Bourbonnais
View author publications
You can also search for this author in PubMed Google Scholar
Miao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Gene Fuh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, 46022, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Knowledge Management, LMU University of Munich, Leopoldstraße 13, 80802, Munich, Germany
Marcus Spies
FAW, University of Linz, Altenbergerstrasse 69, 4040, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Min, H. et al. (2014). Inter-Data-Center Large-Scale Database Replication Optimization – A Workload Driven Partitioning Approach. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-10085-2_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10084-5
Online ISBN: 978-3-319-10085-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Inter-Data-Center Large-Scale Database Replication Optimization – A Workload Driven Partitioning Approach

Abstract

Chapter PDF

Similar content being viewed by others

Optimizing Inter-data-center Large-Scale Database Parallel Replication with Workload-Driven Partitioning

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

Efficient and stable quorum-based log replication and replay for modern cluster-databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Inter-Data-Center Large-Scale Database Replication Optimization – A Workload Driven Partitioning Approach

Abstract

Chapter PDF

Similar content being viewed by others

Optimizing Inter-data-center Large-Scale Database Parallel Replication with Workload-Driven Partitioning

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

Efficient and stable quorum-based log replication and replay for modern cluster-databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation