Keywords

1 Introduction

Blockchain stores and verifies data through a blockchain data structure, and ensures the security of data transmission and access with cryptography [1]. It has the characteristics of high credibility, traceability, and decentralization [2], and can solve the problem of trust of the third party in data storage [3]. With the development of blockchain technology and the accumulation of data in various industries, the traditional single-chain blockchain system has been unable to meet the increasingly complex application scenarios. Master-Slave Blockchain (MSBC) structures (such as Spark Chain, etc.) have begun to attract the attention of experts and scholars in the field, and are gradually widely used in education, medical care, security [4], and other fields [5, 6].

Master-slave blockchain usually includes a master chain and slave chain, which are composed of a master block and slave block respectively, and each master block has only one slave chain. Master-slave blockchain structure can cope with the application of complex classification scenarios. For example, in the financial field, the master-slave blockchain is used to build a company blockchain system that generates financial activities. The master chain stores the information of financial institutions, and the corresponding slave chains store their transaction events, financial activities, and other data. Through the consensus mechanism of blockchain [7], the data cannot be tampered with.

With the continuous increase of data scale, the problems of low query efficiency and long tracing time of blockchain systems [8, 9] become more and more serious. However, the existing blockchain indexes are only suitable for a single chain structure, and the query efficiency and dynamic maintainability are poor. Therefore, how to establish an efficient and dynamically maintainable index structure based on a master-slave blockchain has become a hot and difficult point in the field of research.

To solve the above problems, this paper proposes a composite index construction method (MSCI) based on master-slave blockchain structure. The main contributions of this paper are as follows:

  1. (1)

    A weight matrix is constructed by integrating node load, node credit, and network quality, and a WMS master-slave blockchain fragmentation algorithm is proposed to realize the dynamic fragmentation of the master-slave blockchain structure based on the characteristics of the main chain.

  2. (2)

    On this basis, aiming at the problem that the existing blockchain index is not suitable for the master-slave chain structure, an index construction method based on jump consistency hash is proposed on the master chain. The key value of the master chain node is mapped with the index slot to realize the efficient query of the master chain information.

  3. (3)

    On the slave chain, combining the data characteristics of blockchain, an index construction method based on an improved Bloom filter is proposed.

  4. (4)

    Compared with the existing methods on different constraints and data sets, the effectiveness of the proposed method is verified.

2 Related Work

At present, many scholars have conducted in-depth research on the index construction of blockchain and achieved certain research results.

Vikram Nathan et al. [10] proposed a multi-dimensional memory read optimized index (Flood) method, which automatically adapts to specific data sets and workloads by jointly optimizing the index structure and data storage layout. However, this method needs to evaluate the query cost of the current layout regularly, and completely rebuild the index for different workloads. Huang et al. [11] proposed the B-tree-based indexing (EBTree) method, which supports real-time top-k, range, and equivalent search of Ethereum blockchain data. However, the index nodes of this method are stored separately in Level DB, and the query efficiency is greatly affected by the node size. Andreas Kipf et al. [12] proposed a single-channel learning index RS (RadixSpline) method, which can be constructed in a single transfer of sorted data, and it is friendly to most data sets, but this method will reduce the performance with the increase of data sets. Xing et al. [13] proposed an index method of account transaction chain based on subchain (SCATC), which divided the transaction chain into sub-chains, and converts the query mode traversing the transaction chain into sub-chain query through pointers to reduce the computational overhead, but this method only improved the query efficiency of the longer account transaction chain and only optimized the query in plaintext. Noualhamdi et al. [14] put forward an extensible distributed index (ChainLink) method for large time-series data, designed a two-layer distributed index structure, and realized query operation with partition-level data reorganization. However, this method requires data reorganization locally, which can't guarantee data security. Gao Yuanning et al. [15] put forward a scalable learning index model (Dabble) based on the middle layer, which uses the K-Means clustering algorithm, divides the data set into K regions according to the data distribution, and uses a neural network to learn and train, respectively, and predicts the data position through the neural network model. However, this method requires retraining the model when updating the data set, and the timeliness of the model is poor, and the value of K has a great influence on the accuracy of the model.

To sum up, this paper makes an in-depth study on the index construction of the master-slave blockchain. Aiming at the shortcomings of existing index methods, and considering the space-time complexity and data security of index construction, a composite index construction method based on master-slave blockchain structure is proposed.

3 Blockchain Slice Method Based on Master-Slave Structure

To achieve efficient index construction and query processing of the master-slave blockchain structure, firstly, the whole master-slave blockchain is sliced based on the characteristics of the master chain, and each segment is given a weight. Based on this, the segmentation weight matrix of the whole master-slave blockchain structure is constructed, and the number of nodes in the slice is determined based on the weight matrix, which provides support for the indexing of the master chain and slave blockchain.

3.1 Construction of Weight Matrix

Let the number of nodes in the master-slave blockchain be \(x\), the master-slave blockchain is divided into \(y(y \ll x)\) slices, the i-th slice is \(fi(i = 0,1,2, \cdots ,y - 1)\), and the weight of each slice is \(\omega_{i}\). The slice weight is determined by the weights of three dimensions: node load, node credit, and network quality, in which node credit and network quality are positively correlated with slice weight. Before calculating the slice weight, the units of the above three dimensions are not uniform, so normalization is needed.

Among them, the normalization formula of node load is:

$$ d_{ij} = \left\lfloor {\frac{1}{{\log (d^{\prime}_{ij} + 1) + 1}} \times \frac{x}{y}} \right\rfloor $$
(1)

The normalization formula of node credit and network quality is:

$$ d_{ij} = \left\lfloor {\frac{{d^{\prime}_{ij} + 1 - min}}{max + 1 - min} \times \frac{x}{y}} \right\rfloor $$
(2)

The weight of the i-th slice is \(\omega_{i}\), and the calculation formula is shown in formula (3).

$$ \omega_{i} = \sum\limits_{j - 1}^{3} {\omega_{ij} d_{ij} } $$
(3)

\(d^{\prime}_{ij}\) in formulas (1) and (2) represents the original values of node credit and network quality, \(d_{ij}\) is the normalized value, and max and min are the maximum and minimum values of this component, respectively. The \(\omega_{ij}\) in formula (3) represents the weight of each component.

Let each element of the two-dimensional weight matrix M be composed of slice weights. After the weight of each slice is obtained, a two-dimensional weight matrix M[p][q] is constructed by using each slice’s weight \(\omega i\) (where \(p \le \sqrt y\), and p and q are integers). For any slice f, there is \(M[{f \mathord{\left/ {\vphantom {f p}} \right. \kern-0pt} p}][f\% q] = \omega_{f}\), and the empty position in the matrix is set to 0.

3.2 Determining the Number of Nodes in a Slice Based on Weight Matrix

Determine the number of nodes in each slice based on the weight matrix in Sect. 3.1. Firstly, the slice weights in the matrix are linearly normalized. Secondly, all normalized slice weights are dispersed in proportion, and the interval of discrete proportional weights is set as [1, Q]. The final slice proportional weight Q-1 is obtained, that is, the slice will correspond to Q-1 nodes.

Number the nodes in the order of {0, 1, 2, …, \(\sum\limits_{i}^{x - 1} {\omega_{i} - 1}\)}, and the node number corresponding to the kth slice is [\(\sum\limits_{i = 0}^{k} {\omega_{i} }\), \(\sum\limits_{i = 0}^{k + 1} {\omega_{i} } - 1\)]. After obtaining the node number, the slice number can be obtained by looking up the table.

Among them, the linear normalization formula is:

$$ \omega ^{\prime}_{i} = \frac{{\omega_{i} - \omega_{min} }}{{\omega_{max} - \omega_{min} }} $$
(4)

\(\omega_{min}\) in formula (4) is the minimum value of all slice weights, \(\omega_{max}\) is the maximum value, and \(\omega ^{\prime}_{i}\) is the normalized slice weight.

Based on the weight matrix in Sect. 3.1, a weighting matrix-based slice algorithm (WMS) is proposed, which is shown in Algorithm 1 and realizes the slice of the master-slave blockchain structure.

figure a

4 Methods for Constructing Composite Index

Because the data scale and information types stored in the master chain and slave chain of the master-slave blockchain structure are different, after slicing by the WMS algorithm, a composite index construction method based on jump consistency hash and improved Bloom filter is proposed to meet the query requirements of the master chain and slave chain. The schematic diagram of composite index construction is shown in Fig. 1.

Fig. 1.
figure 1

Composite index construction schematic

4.1 Construction of Master Chain Index Based on Jump Consistency Hash

Based on the characteristics of data stored in the master chain, this paper introduces the jump consistent hash algorithm and proposes a jump consistent hashing-based master chain index construction method (JHPI) to realize the fast construction of the master chain index. Firstly, the number of index slots is determined according to the number of nodes in each master chain slice; secondly, the key value of each node is determined according to the hash value of the master chain stored data; finally, the key value of each node and the number of index slots are input, and the master chain index is output.

When the number of nodes in a slice changes, the nodes jump in the index, and some nodes are remapped. Let the hash mapping function that produces the jump change be ch(key, num_buckets), key be the key value of nodes, and num_buckets be the number of slots, when the number of slots changes from n to n + 1, the slot where the node with n/(n + 1) is located remains the same, that is, ch(key, num_buckets) = n-1, and 1/(n + 1) keys need to be remapped, that is, ch(key, num_buckets) = n.

The b is the result of the last jump of the node, and j is the result of the next jump. For any \(i(i \in [b + 1,j - 1])\), the probability that the number of nodes does not jump is shown in formula (5).

$$ P(i) = \frac{b + 1}{{b + 2}} \times \frac{b + 2}{{b + 3}} \times \cdots \times \frac{i - 1}{i} = \frac{b + 1}{i} $$
(5)

The construction method of the master chain index based on jump consistent hash is shown in Algorithm 2.

figure b

4.2 Construction of Slave Chain Index based on Improved Bloom Filter

In the process of constructing the dependent chain index, the data structure of the Bloom filter is reconstructed, and the column-based selection function is proposed to realize the dependent chain index construction method (IBF) based on the improved Bloom filter.

Firstly, the data structure is a two-dimensional array A[p][q], where \(p = 2^{n}\) (n is a positive integer), and the data length in q is \(lq\), assuming that \(lq\) = 32/64 bits. The value is determined by the cache line length of general registers in the CPU to reduce memory access and improve query performance. Let the K hash functions of the improved Bloom filter be Hash(key), where the key is the key value of the node, and let the length of the elements that can be stored in an improved Bloom filter be len, and the calculation result of len is shown in formula (6).

$$ len = p \times q = 2^{n} \times lq $$
(6)

After the data structure of the index is constructed, the index of the slave chain corresponding to each main block is constructed, and the specific steps are as follows:

Step1: Use the function of selecting columns, first map the element to the corresponding column, and the element will be at the position of the corresponding column;

Step2, obtaining the locus through K hash functions;

Step3: Set the corresponding site to 1, and the rest sites to 0.

Among them, the column selection function of Step1 is based on the SHA256 hash function, and the column selection function is obtained by modular operation. As shown in formula (7):

$$ qv = v\% 2^{n} $$
(7)

In Step2, the K hash functions are composed of K bitwise AND operations, which can be expressed as:

$$ pv^{i} = v\& \sum\limits_{j}^{{i \times \frac{k^{\prime}}{k} - 1}} {2^{i - 1} } (0 < i \le k) $$
(8)

The v in formula (7) is the element in the Bloom filter, and \(qv\) is the column number obtained after column selection. After obtaining the column in which the element is located, it is determined that the element will be limited to the corresponding column during construction and query. The \(pv^{i}\) in formula (8) is the row number obtained by K times of operation of formula (8) in the column after the column number is obtained, that is, the corresponding position, and \(k^{\prime}\) is the array length in the general Bloom filter. The slave chain index construction algorithm based on the improved Bloom filter is shown in Algorithm 3.

figure c

5 Experiments

The experimental environment of this paper is 16 servers with 1T storage space, 8G RAM, and a 4-core CPU. The servers communicate with each other through a high-speed network, and each server is equipped with an ubuntu 18.04 operating system. Two different data sets are used for experimental verification. The first data set is the first 3,000,000 blocks in the public Ethereum network, and there are 15,362,853 transactions in the data set. The second data set is the Lognormal artificial data set. Lognormal data set samples 5 million pieces of non-duplicate data according to lognormal distribution (mean value is 0, variance is 2). In this section, we will verify the high efficiency and low memory advantages of MSCI from three aspects: index building time, query time, and memory consumption.

5.1 Comparison of Index Construction Time

To verify the efficiency of the MSCI method proposed in this paper in terms of index construction time, the EBTree method with improved blockchain structure and Dabble model method with the neural network will be compared respectively. In the EBTree method, the capacity of internal nodes is set to 128, and the capacity of leaf nodes is set to 16. The value of k in Dabble model is 100, and the number of nodes in a slice of the MSCI method is set to 100. In this section, the experiment will be divided into three specific situations as shown in Table 1.

Table 1. Index build time comparison.

As shown in Table 1, the experimental results of index construction time comparison are shown in Fig. 2 and Fig. 3.

Fig. 2.
figure 2

Index build time comparison 1

Fig. 3.
figure 3

Index build time comparison 2

As can be seen from Fig. 2 and Fig. 3, with the increase in data volume, compared with the existing methods, the index construction time of MSCI is optimized by about 4.79% and 4.47% respectively.

5.2 Comparison of Query Time

In this experiment, to test the query performance of MSCI, we will compare the query response time of the EBTree method when the number of main blocks is 500k, 1000k, 1500k, 2000k, 2500k, and 3000k, and the number of subordinate blocks is 1000. Compare the query response time when Dabble method stores 10, 20, 30, 400, 500,000 pieces of data in the main block and 1000 pieces of data in the subordinate block. The experimental results are shown in Fig. 4.

Fig. 4.
figure 4

Index query time comparison.

As can be seen from Fig. 4, compared with the existing methods, the index construction time of MSCI is optimized by about 7.44% and 5.89%, respectively, and when the data volume increases, the advantage of MSCI is more obvious than that of EBTree.

5.3 Comparison of Memory Consumption

The weight matrix constructed by MSCI in the slicing stage hardly takes up memory, and the master chain builds the index based on the jump consistent hash algorithm. Compared with the classical consistent hash, the jump consistent hash has almost no additional memory consumption, so the memory overhead in MSCI mainly considers the index construction of the slave blockchain. The false-positive of IBF is set to 0.0137 bits. The EBTree method rewrites the blockchain structure, and the memory consumption is mainly blocked data. Therefore, this section will compare the Dabble method. The experimental results are shown in Table 2.

Table 2. Memory Consumption Comparison.

As can be seen from Table 2, the Lognormal data set takes up 24MB of memory, while the neural network takes up 4KB of memory in Dabble method, while the IBF in MSCI still only takes up about 2.048KB of memory within the allowable range of false positives. Even if IBF and BF are in the same false positives, they can still keep the same order of magnitude.

6 Conclusion

With the wide application of blockchain technology, the traditional single-chain structure has gradually failed to meet the needs. Adding slave blockchain on the basis of the single-chain will make blockchain be applied to a wider range of fields, such as education chain and food chain. This paper proposes a composite index construction method for master-slave blockchain. Specifically, the whole master-slave blockchain structure is sliced according to the master chain, and the weight matrix is used to improve the maintainability of segmentation, which provides support for index construction. On this basis, aiming at the characteristics of different data scales between master chain and slave chain, the index construction method based on jump consistency hash algorithm and the index construction method of slave chain based on IBF is proposed to improve the query efficiency of the master-slave blockchain. Experimental results show that the proposed method has great advantages over the existing methods in terms of construction time, query efficiency, and memory consumption.