1 Introduction

With the development of semiconductor technology to ultra-deep submicron and nano-scale, the integration and complexity of very large scale integrated circuit increase rapidly. It faces huge challenges in high test power consumption and growing test data volume [11], more memory occupancy and prolonged test application time results in high test cost, which seriously impacts on the integrated circuits manufacturing. Traditional test scheme using automatic test equipment (ATE) is an inefficient method for advanced system-on-a-chip (SoC) with a large amount of test data volume. The testing of integrated circuits with random test patterns will result in high power consumption in test mode [14]which is hazardous to functionality of integrated chip [10]and may cause instant circuit damage, decreased system reliability and an undesired yield loss [15, 22].

Test set are randomly generated by Automatic Test Pattern Generation (ATPG) tool in Built-In Self-Test (BIST) architecture, which is performed on LFSR (linear-feedback shift-register) reseeding scheme with seed vectors, it significantly reduces test-data storage and transmission bandwidth. For given a deterministic test cube, the corresponding seed vectors can be achieved by solving a set of linear equations (one equation for each specified bit) based on the feedback polynomial of the LFSR circuit. Only 1%–5% specified bits in a test set is applied to solve a seed vector. It was determined in [7] that the probability of unsolvable is less than 10−6, the LFSR size must be larger than Smax +20, where Smax is the largest number of specified bits in all test cubes. Furthermore, multiple-polynomial LFSRs [5] or variable-length multiple polynomial LFSRs [6] are used to reduce the total bits of seeds. An LFSR with Smax size is sufficient to encode all deterministic test cubes for circuits testing [4, 20].

The unspecified bits in test cubes are filled with pseudo-random number though LFSR reseeding, so excessive switching activity occur when they are shifted into a scan chain, which consumes a lot of test power. Hence, the switching activities in deterministic pattern generation are effectively reduced in [21], and the low transition random test pattern generator reduces switching activity of CUT by reducing transitions during scan testing [17]. A low power scheme adopts dual LFSR reseeding [13] and scan slice overlapping [23], respectively. Each pattern is partitioned into several overlapping slice sets, no transition is produced and no specified bits need to be generated via LFSR reseeding in the overlapping block [23]. A low-power linear-decompression scheme which operates X-specified and X-filling successively enhance seed encoding efficiency and reduce shift and capture power consumption [16]. And hold cubes-based low power test scheme are presented, while additional test storage and hardware overhead are required for the hold cubes [9]. A low power scan architecture is used to generate tests for stuck-at faults [12]. Above-mentioned, this paper develops a LFSR reseeding-oriented low-power test-compression scheme for scan testing.

2 LFSR Reseeding Procedure

LFSR reseeding generates pseudo-random sequences for scan testing in a typical test compression scheme. External LFSR circuit shown in Fig. 1. ak ∈ {0, 1} denotes output sequence of flip-flop, ci ∈ {0, 1} denotes a constant coefficient, A(0) indicates the initial seed vector. The states of all registers are represented by L-dimensional vectors shown as follows:

$$ \mathbf{A}(t)={a}_1(t){a}_2(t){a}_3(t)\cdots {a}_i(t)\cdots {a}_L(t) $$
(1)
Fig. 1
figure 1

LFSR architecture-based XOR external-connection

The state transfer equation is summarized as follows:

$$ \mathbf{A}\left(t+k\right)={\mathbf{V}}^k\cdot \mathbf{A}(t) $$
(2)

For a scan design circuit, most faults can be detected through pseudo-random test vectors, the remains can be detected through deterministic test vectors. The latter vectors are coded as a short seed set and then stored in ROM. During LFSR reseeding, the seed vector is automatically loaded to scan chain for a certain clock period until all seed vectors are expanded into target test pattern for scan testing.

Supposing a given LFSR with L order-characteristic polynomial and test cube contains s specified bits, which means that the linear system of equations contains L unknowns and s equations. Each specified bit corresponds to one state equation, so LFSR seed vector is solved from all equations by Gaussian-Jordan elimination method. Since the compression ratio of LFSR reseeding scheme largely depends on the number of specified bits, the number of specified bits is reduced to decrease the number of equations, which results in a short length of seed vector and small size of LFSR.

A reseeding example is illustrated by three-order LFSR in Fig. 2. Si ∈ {0, 1}denotes the state of scan chain. Supposing test vector 1X1010X1X loaded into scan chain, LFSR characteristic polynomial is f(x) = 1 + x2 + x3, the state transition matrix is \( \mathbf{V}=\mid {\displaystyle \begin{array}{ccc}0& 1& 1\\ {}1& 0& 0\\ {}0& 1& 0\end{array}}\mid \).

Fig. 2
figure 2

A three-order LFSR connected to scan chain

Only the specified bits is applied to solve the seed vector, the state equations are simplified in Eq. 3:

$$ \left[\begin{array}{l}{\mathrm{S}}_2(9)\\ {}{\mathrm{S}}_4(9)\\ {}{\mathrm{S}}_5(9)\\ {}{\mathrm{S}}_6(9)\\ {}{\mathrm{S}}_7(9)\\ {}{\mathrm{S}}_9(9)\end{array}\right]=\left[\begin{array}{ccc}0& 1& 0\\ {}0& 1& 1\\ {}1& 1& 0\\ {}1& 1& 1\\ {}1& 0& 1\\ {}0& 1& 0\end{array}\right]\cdotp \left[\begin{array}{l}{a}_1\\ {}{a}_2\\ {}{a}_3\end{array}\right]=\left[\begin{array}{l}1\\ {}0\\ {}1\\ {}0\\ {}1\\ {}1\end{array}\right] $$
(3)

Therefore, the equations have a unique solution: A(0) = a1a2a3 = 011, and test cube 1X1010X1X is encoded as seed vector 011, it is loaded into LFSR circuit as initial state for scan testing. After 9 clock cycles, the test pattern 1,110,100,011 compatible with the original test cube is generated.

3 LFSR Reseeding-Based Test Compression Scheme

3.1 Block Encoding Rule

To reduce the number of specified bit for high compression ratio and enhance the consistency between adjacent bits for low test power consumption, test cube is divided into 4 types of data blocks with equal length, namely: 0 compatible, 1 compatible, incompatible and the unspecified blocks. Different types of blocks are coded with different block markers. The block encoding rule is shown in Table 1. Here, 0 compatible block only includes specified bits 0, 1 compatible block only includes specified bits 1, incompatible block includes two kinds of specified bits (1 and 0), unspecified blocks include no specified bit. A hold flag denotes block types, it indicates whether this data block is directly generated by LFSR circuit or not. In addition, the first bit in 0 and 1 compatible block are set to 0 and 1, respectively. Only the first bit in compatible block is generated by LFSR circuit, while the other bits are automatically consistent with the first bit. Obviously, the encoding rule effectively reduces the number of specified bits, and the optimized filling scheme is involved in reducing the shift power consumption.

Table 1 Block encoding rule

A test cube is called to illustrate the block encoding rule in Fig. 3. Original test cube with 15 specified bits is divided into four type’s blocks, while the encoded test cube only contains 9 specified bits (3 for hold flags and 6 for specified bits). Thereby, a high compression ratio can be achieved in this way. Moreover, no logic transitions occur in blocks 2 and 3 because all data bits in block keep unchanged during LFSR reseeding, which is beneficial for test power consumption reduction.

Fig. 3
figure 3

Test data encoding illustration

3.2 Test Cube Optimized Encoding Procedure

The optimized encoding procedure is demonstrated as follows. As shown in Table 2, there are 67 specified bits in a 180 bit test set, it consists of five 32-bit test cubes, and each test cube is divided into four 8-bit data blocks.

Table 2 Original test cube

According to block encoding rule shown in Table 1, each block is encoded as a 1-bit hold flag and 8-bit test data in Table 3. The encoded test cube only contains 47 specified bits (15 for hold flag and 32 for specified bit). Compared to the original test cube, specified bits are reduced 20 bit through optimized block encoding. Hold flag taken as 0 is regarded as compatible block, which devotes to test power consumption reduction to a large extent.

Table 3 Encoded test cube

Above all, the number of specified bits in test set determines the test storage for LFSR reseeding, and the additional hold flag will increase the specified bits. Hence, there is a trade-off between power consumption and test storage.

3.3 Dividing into Hold Flag Partition Sets

Hold flag cube shown in Table 4 (a) is constructed by all hold flags of each block in Table 3, which includes 15 specified bits. It is noticed that lots of hold flags are compatible with each other. Namely, there is no any conflict among all specified bits in blocks. If a hold flag could be shared by multiple test cubes, it only need to be loaded once for each hold flag-compatible set during scan testing. In Table 4 (b), all test cubes are reordered and separated into a compatible set with the compatible hold flag. In Table 4(c), hold partition set is further updated by extra update hold flag, which indicates whether hold flag for the current test cube needs to be updated or not.

Table 4 Updating hold partition sets

Noteworthy, five test cubes are grouped into two hold flag-compatible sets. The first set contains test cubes 2, 4 and 5, whose update flag bit and hold flag cube are 100 and 1000.The second set contains test cubes 1 and 3, whose update flag bit and hold flag cube are 10 and 0110.

A test set is called to illustrate the optimized encoding scheme in Table 5. Obviously, hold flag cubes are only loaded twice during scan testing, which significantly reduces test application time.

Table 5 Optimized test set

Compared to the encoded test cube shown in Table 3, the number of specified bits is further reduced 2 bit through updating hold flag partition. Actually, the scale of test set in industrial circuit is much larger than that of test set shown in Table 5. Thus, the number of specified bits are extremely reduced in this way, which overwhelmingly reduced the number of equations and the computational complexity to solve the seeds.

4 Implementation of Optimized Encoding Algorithm

To implement the above optimized encoding algorithm, the proposed LFSR reseeding-based test compression scheme is illustrated in Fig. 4.

Fig. 4
figure 4

Proposed LFSR reseeding-based test compression scheme

To summarize, block size B, LFSR size L and Lmin are initialized, test cubes are divided into equal blocks with size B. The optimized encoding algorithm is applied to these blocks until the minimum seed size Lmin is solved from an appropriate block for LFSR reseeding. B dynamically changes from 1 to 100 by a greedy algorithm for efficient test compression. Lmin solving process is implemented in Table 6.

Table 6 Lmin solving process

5 Hardware Implementation of Decoder Architecture

5.1 Decoder Architecture

LFSR reseeding circuit is demonstrated in Fig. 5, which consists of ATE, LFSR, phase shifter, finite state machine (FSM), decoder architecture and circuit under test (CUT). Here, FSM controller is composed of a bit counter and some small combinational logic. 1-bit Update Hold Flag-shifter (UF-SR) and Hold Flag-shifter (HF-SR) are used to store update hold flag and hold flag, respectively. Sign-SR (Sign-SR) is used to store the first sign bit of each block. Obviously, hardware overhead is mainly consumed on a small FSM controller, two 2-to-1 MUX, one HF-SR and HF-SR per scan chain, two 2-to-1 MUX and one Sign-SR per block.

Fig. 5
figure 5

Hardware architecture of LFSR reseeding circuit

Obviously, the proposed hardware architecture requires additional circuitry such as LFSR, phase shifter and decoder architecture, which would change the timing analysis of BIST architecture, so a timing analysis is further conducted to provide more timing information in Fig. 6. During scan testing, seed vectors pre-stored in ROM are loaded into LFSR reseeding circuit under the control of LFSR_clk, these seeds are decompressed into test patterns in decoder architecture under the control signal of the FSM. These test patterns are scanned into multiple scan chains for circuit under testing in parallel under the control of Scan_clk. Test response is captured and then compressed by a compactor for comparing with golden seed vectors in ATE for test results.

Fig. 6
figure 6

Timing diagram of test data decompression

5.2 Decoding Procedure

Each scan chain is divided into one or more blocks. Supposing size B is the number of blocks per scan chain. Each scan chain has a HF-SR whose size is equal to B. LFSR reseeding is used to generate all test data for each test cube, which consists of three components, namely: update flag, hold flag and test data with a sign flag. The format for test data coming out of LFSR reseeding for each test cube is given in Fig. 7.

Fig. 7
figure 7

Data format generated by LFSR-reseeding

A small finite-state machine (FSM) controller controls whether the data coming out from the LFSR is stored or not. In the first clock cycle, LFSR circuit generates a single update hold flag bit and stores it in UF-SR. If the update hold flag is 1, LFSR circuit in next B clock cycles generates hold flags for each scan chains and loads them into HF-SRs. If update hold flag is 0, the HF-SRs are not loaded. Supposing the length of each scan chain be M. LFSR circuit generates test data for the next M clock cycles. During each M/B clock cycle, if the corresponding hold flag is 1, the scan chain is loaded from the LFSR, If the corresponding hold flag is a 0, the LFSR generates a single bit, which is the sign flag, its’ value is repeatedly shifted into the scan chain for M/B clock cycle, and test data from the LFSR is ignored.

After each M/B clock cycle, the HF-SR is shifted so that the next hold flag becomes active for its corresponding block and used as the control signal of a multiplexer. Once the scan chains is filled, test pattern is applied to circuit under test, and the test response is captured from scan chain. The above processing is repeated to generate the next test pattern. Noteworthy, the size of the HF-SR and Sign-SR dominantly determines the resource occupation, which depends on the number of scan chains and the total number of blocks.

5.3 Decoding Timing Analysis

As shown in Table 7, in order to verify the designed decode architecture, four test cubes with four data blocks are taken as an encoding example to solve the seed vector.

Table 7 Test sets after optimization of encoding and grouping

Four seed vectors are expanded to four test cubes by LFSR reseeding in Table 8. Obviously, all decoded test cubes are completely compatible with the original test cubes.

Table 8 An illustration of LFSR reseeding

Moreover, the functional verification and timing analysis of decoder is performed on Mentor’s EDA software Modelsim tool. The simulation results of Fig. 8 are in perfect agreement with the timing analysis.

Fig. 8
figure 8

Simulation results of decompression flow

6 Circuit Simulation and Results Analysis

The experiments for the six largest ISCAS Benchmark circuits are performed on CPU with the Intel core 3.2 GHz and 4G RAM, and test sets are generated by Atalanta ATPG tool with dynamic compaction. Basic information of Benchmark circuit is shown in Table 9. PI and PO represent the numbers of input and output port, respectively. INV, LG and FA indicates the total numbers of inverter, logic gate and stuck-at-fault.

Table 9 Basic information of Benchmark circuit

Pseudo random test set is applied to detect easily faults. Furthermore, LFSR reseeding scheme is used to test faults that are failure to be detected after 10,000 pseudo random test patterns, whose test set is generated by using ATPG tool (Lee & Ha, 1993). Fault simulation is performed on HOPE (Lee& Ha, 1992) simulator.

6.1 Test Compression Ratio Analysis

As shown in Table 10, the proposed scheme are compared with previous encoding works in the compression ratios, Smax is the largest number of specified bits in all test cubes. Compression ratio is defined as CR %  = [(| TD|  − | TE| )/TD] × 100%, where ∣TD∣ is the size of original test set, ∣TE∣ is the size of seed vectors.

Table 10 Compression ratio comparison with previous encoding scheme

Obviously, compared with EFDR [3] and Hybrid coding [19] scheme, the proposed optimized encoding algorithm achieves a better compression effect for overwhelming majority circuits, and the average compression ratio achieves up to 92.9%. For circuits S15850 and circuits S38417, the compression ratio is slightly lower than that of hybrid coding scheme. However, the decoding structure of hybrid coding scheme is more complex, thus it causing long test application time and high hardware overhead.

In addition, the compression ratios compared with other LFSR reseeding schemes are presented in Table 11. Obviously, the proposed LFSR reseeding-based test compression scheme achieves a high compression ratio for all circuits than Dual-LFSR [13], Modified scan-slice [23] and low-power LFSR [9], the average compression ratio achieves up to 93.2%.

Table 11 Compression ratio compared with the previous schemes

6.2 Area Overhead for Decoder Circuit

The area overhead for decompression circuit is shown in Table 12. The standard library for TSMC 0.18 μm process [1] is applied to calculate the area overhead.

Table 12 Area overhead for decoder circuit

As shown in Table 12, the proposed scheme consumes a little more area overhead than that of Partial LFSR reseeding. Nevertheless, it increases compression ratio on average owing to block-based coding during Scan-in. In addition, area overhead of the decompression architecture is generally acceptable compared with the prior schemes.

6.3 Test Power Consumption Analysis

Test power includes shift power and capture power in BIST scheme. Actually, shift mode including shift-in mode and shift-out mode occupies most test time during chip testing, so the shift power determines the average power of circuit under testing [18]. The timing diagram of shift-in and shift out in test per-scan architecture is presented in Fig. 9. Here, CLK1 is the at-speed clock in CUT to be applied in capture phase, CLK2 is the shift clock signal to control test vectors are shifted-in/out of the scan chains, and CLK3 is the clock signal that the sequential elements on the scan chain will receive. It obviously noted that the shift-in power is associated with test vectors and shift-out power is associated with test responses.

Fig. 9
figure 9

Timing diagram for test vectors shift-in and shift-out

To estimate how shift powers associates with test patterns, they are calculated by the weighted transition metric (WTM) [2], whose value reflects the inversion states of circuit under test, so it is used to evaluate the scan shift power caused by the corresponding test patterns.

Assuming that test set and test response are T = {T1, T2,T3,⋯, Tm} and R = {R1, R2,R3,⋯, Rm}, respectively. The length of each test cube is represented as Ti = {ti1, ti2, ti3, ⋯, tin}, 1 ≤ i ≤ m, 1 ≤ j ≤ n, tin denotes the n-th bit in the i-th test pattern. Hence, WTM of shift-in and shift-out are estimated as follows:

$$ WT{M}_{i-\mathrm{in}}=\sum \limits_{i=1}^{n-1}\left(n-j\right)\ast \left({t}_{i,j}\oplus {t}_{i,j+1}\right) $$
(4)
$$ WT{M}_{i-\mathrm{out}}=\sum \limits_{i=1}^{n-1}j\ast \left({R}_{i,j}\oplus {R}_{i,j+1}\right) $$
(5)

For N test patterns, the average shift-in/out power consumption Pshift − in and Pshift − out are respectively estimated as follows:

$$ {P}_{shift- in}=\frac{1}{N}\sum \limits_{i=1}^N WT{M}_{i-\mathrm{in}}=\frac{1}{N}\sum \limits_{i=1}^N\sum \limits_{j=1}^{n-1}\left(n-j\right)\ast \left({t}_{i,j}\oplus {t}_{i,j+1}\right) $$
(6)
$$ {P}_{shift-\mathrm{out}}=\frac{1}{N}\sum \limits_{i=1}^N WT{M}_{i-\mathrm{out}}=\frac{1}{N}\sum \limits_{i=1}^N\sum \limits_{j=1}^{n-1}j\ast \left({R}_{i,j}\oplus {R}_{i,j+1}\right) $$
(7)

Peak scan-in power consumption Ppeak are estimated as follows:

$$ {P}_{peak}={\max}_{1\le \mathrm{i}\le \mathrm{m}} WT{M}_i $$
(8)

Consequently, the average and peak power consumption of the proposed scheme for shift-in and shift-out mode are evaluated in Table 13.

Table 13 Shift power consumption evaluation of the proposed scheme

The compression ratio and shift-in power consumption are compared with other schemes in Table 14. Compared with the former three schemes, the proposed scheme has obvious advantages in test compression ratio for each circuit. In addition, it significantly outperformed in shift-in power reduction than the first and third schemes and slightly underperformed in shift-in power reduction than the second scheme.

Table 14 Comparison of compression ratio and shift-in power consumption with other schemes

Overall, the proposed LFSR reseeding-based test compression scheme shows a trade-off between the shift-in power consumption reduction and test compression ratio.

7 Conclusion

The LFSR reseeding-oriented deterministic BIST scheme is an efficient technology for compressing test data. The proposed low-power test-compression scheme combines clustering scan-block with updating hold flag, which provides a valid way to significantly reduce test storage and test power consumption for scan design. In addition, the optimized encoding algorithm can be applied to either a BIST environment or test compression schemes based on LFSR reseeding to satisfy certain test power constraints, so it is practical to test large-scale industrial circuit.