Sublinear Decoding Schemes for Non-adaptive Group Testing with Inhibitors

Bui, Thach V.; Kuribayashi, Minoru; Kojima, Tetsuya; Echizen, Isao

doi:10.1007/978-3-030-14812-6_7

Thach V. Bui¹⁶,
Minoru Kuribayashi¹⁸,
Tetsuya Kojima¹⁹ &
…
Isao Echizen^16,17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11436))

Included in the following conference series:

International Conference on Theory and Applications of Models of Computation

632 Accesses
1 Citations

Abstract

Identification of up to d defective items and up to h inhibitors in a set of n items is the main task of non-adaptive group testing with inhibitors. To reduce the cost of this Herculean task, a subset of the n items is formed and then tested. This is called group testing. A test outcome on a subset of items is positive if the subset contains at least one defective item and no inhibitors, and negative otherwise. We present two decoding schemes for efficiently identifying the defective items and the inhibitors in the presence of e erroneous outcomes in time $\mathsf {poly}(d, h, e, \log _2{n})$, which is sublinear to the number of items. This decoding complexity significantly improves the state-of-the-art schemes in which the decoding time is linear to the number of items, i.e., $\mathsf {poly}(d, h, e, n)$. Moreover, each column of the measurement matrices associated with the proposed schemes can be nonrandomly generated in polynomial order of the number of rows. As a result, one can save space for storing them. Simulation results confirm our theoretical analysis. When the number of items is sufficiently large, the decoding time in our proposed scheme is smallest in comparison with existing work. In addition, when some erroneous outcomes are allowed, the number of tests in the proposed scheme is often smaller than the number of tests in existing work.

Access provided by Autonomous University of Puebla. Download conference paper PDF

An Efficient Algorithm for Combinatorial Group Testing

On Detecting Some Defective Items in Group Testing

Optimal Deterministic Group Testing Algorithms to Estimate the Number of Defectives

Keywords

1 Introduction

Group testing was proposed by an economist, Robert Dorfman, who tried to solve the problem of identifying which draftees had syphilis [1] in WWII. Nowaday, it is known as a problem of finding up to d defective items in a colossal number of items n by testing t subsets of n items. It can also be translated into the classification of up to d defective items and at least $n - d$ negative items in a set of n items. The meanings of “items”, “defective items”, and “tests” depend on the context. Normally, a test on a subset of items (a test for short) is positive if the subset has at least one defective item, and negative otherwise. For testing design, there are two main approaches: adaptive and non-adaptive designs. In adaptive group testing, the design of a test depends on the earlier tests. With this approach, the number of tests can be theoretically optimized [2]. However, it would take a long time to proceed such sequential tests. Therefore, non-adaptive group testing (NAGT) [2, 3] is preferable to be used: all tests are designed in prior and tested in parallel. The proliferation of applying NAGT in various fields such as DNA library screening [4], multiple-access channels [5], data streaming [6], neuroscience [7], has made it become more attractive recently. We thus focus on NAGT in this work.

The development of NAGT applications in the field of molecular biology led to the introduction of another type of item: inhibitor. An item is considered to be an inhibitor if it interferes with the identification of defective items in a test, i.e., a test containing at least one inhibitor item returns negative outcome. In this “Group Testing with Inhibitors (GTI)” model, the outcome of a test on a subset of items is positive iff the subset has at least one defective item and no inhibitors. Due to great potential for use in applications, the GTI model has been intensively studied for the last two decades [8,9,10,11].

In NAGT using the GTI model (NAGTI), if t tests are needed to identify up to d defective items and up to h inhibitors among n items, it can be seen that they comprise a $t \times n$ measurement matrix. The procedure for obtaining the matrix is called the construction procedure. The procedure for obtaining the outcome of t tests using the matrix is called encoding procedure, and the procedure for obtaining the defective items and the inhibitor items from t outcomes is called the decoding procedure. Since noise typically occurs in biology experiments, we assume that there are up to e erroneous outcomes in the test outcomes. The objective of NAGTI is to efficiently classify all items from the encoding procedure and from the decoding procedure in the presence of noise.

There are two approaches when using NAGTI. One is to identify defective items only. Chang et al. [12] proposed a scheme using $O((d+h + e)^2\log _2{n})$ tests to identify all defective items in time $O((d+h + e)^2 n\log _2{n})$. Using a probabilistic scheme, Ganesan et al. [13] reduced the number of tests to $O((d + h) \log _2{n})$ and the decoding time to $O((d + h)n \log _2{n})$. However, this scheme proposed is applicable only in a noise-free setting, which is restricted in practice. The second approach is to identify both defective items and inhibitors. Chang et al. [12] proposed a scheme using $O( e(d+h)^3\log _2{n})$ tests to classify n items in time $O( e(d+h)^3 n \log _2{n})$. Without considering the presence of noise in the test outcome, Ganesan et al. [13] used $O((d + h^2) \log _2{n})$ tests to identify at most d defective items and at most h inhibitor items in time $O((d + h^2)n \log _2{n})$.

1.1 Problem Definition

We address two problems. The first is how to efficiently identify defective items in the test outcomes in the presence of noise. The second is how to efficiently identify both defective items and inhibitor items in the test outcome in the presence of noise. Let z be an odd integer and $e = \frac{z\,-\,1}{2}$ be the maximum number of errors in the test outcomes.

Problem 1

There are n items including up to d defective items and up to h inhibitor items. Is there a measurement matrix such that

All defective items can be identified in time $\mathsf {poly}(d, h, e, \log _2{n})$ in the presence of up to e erroneous outcomes, where the number of rows in the measurement matrix is much smaller than n?
Each column of the matrix can be nonrandomly generated in polynomial time of the number of rows?

Problem 2

There are n items including up to d defective items and up to h inhibitor items. Is there a measurement matrix such that

All defective items and inhibitors items can be identified in time $\mathsf {poly}(d, h, e, \log _2{n})$ in the presence of up to e erroneous outcomes, where the number of rows in the measurement matrix is much smaller than n?
Each column of the matrix can be nonrandomly generated in polynomial time of the number of rows?

We note that some previous works such as [14, 15] do not consider inhibitor items. In these works, Problems 1 and 2 can be reduced to the same problem by eliminating all terms related to “inhibitor items.”

1.2 Problem Model

We model NAGTI as follows. Suppose that there are up to $1 \le d$ defectives and up to $0 \le h$ inhibitors in n items. Let $\mathbf {x}= (x_1, \ldots , x_n)^T \in \{0, 1, -\infty \}^n$ be the vector representation of n items. Note that the number of defective items must be at least one. Otherwise, the outcomes of the tests designed would yield negative. Item j is defective iff $x_j = 1$, is an inhibitor iff $x_j = -\infty $, and is negative iff $x_j = 0$. Suppose that there are at most d 1’s in $\mathbf {x}$, i.e., $\left| D = \{ j \mid x_j = 1, \text{ for } j = 1, \ldots , n \} \right| \le d$, and at most h $-\infty $’s in $\mathbf {x}$, i.e., $\left| H = \{ j \mid x_j = -\infty , \text{ for } j = 1, \ldots , n \} \right| \le h$.

Let $\mathcal {Q}= (q_{ij})$ be a $q \times n$ binary measurement matrix which is used to identify defectives and inhibitors in n items. Item j is represented by column j of $\mathcal {Q}$ $(\mathcal {Q}_j)$ for $j = 1, \ldots , n$. Test i is represented by row i in which $q_{ij} = 1$ iff the item j belongs to test i, and $q_{ij} = 0$ otherwise, where $i = 1, \ldots , q$. Then the outcome vector using the measurement matrix $\mathcal {Q}$ is

$$\begin{aligned} \mathbf {r}= \mathcal {Q}\otimes \mathbf {x}= \begin{bmatrix} r_1 \\ \vdots \\ r_q \end{bmatrix}, \end{aligned}$$

(1)

where $\otimes $ is called the NAGTI operator, test outcome $r_i = 1$ iff $\sum _{j = 1}^n q_{ij} x_j \ge 1$, and $r_i = 0$ otherwise for $i = 1, \ldots , q.$ Note that we assume $0 \times (-\infty ) = 0$ and there may be at most e erroneous outcomes in $\mathbf {r}$.

Given l binary vectors $\mathbf {y}_w = (y_{1w}, y_{2w}, \ldots , y_{Bw})^T$ for $w=1, \ldots , l$ and some integer $B \ge 1$. The union of $\mathbf {y}_1, \ldots , \mathbf {y}_l$ is defined as vector $\mathbf {y} = \vee _{i = 1}^l \mathbf {y}_i = (\vee _{i = 1}^l y_{1i}, \ldots , \vee _{i = 1}^l y_{Bi})^T$, where $\vee $ is the OR operator. Then when vector $\mathbf {x}$ is binary, i.e., there are no inhibitors in n items, (1) can be represented as

$$\begin{aligned} \mathbf {r}= \mathcal {Q}\otimes \mathbf {x}= \bigvee _{j = 1}^n x_j \mathcal {Q}_j = \bigvee _{j \in D}^n \mathcal {Q}_j. \end{aligned}$$

(2)

Our objective is to design the matrix $\mathcal {Q}$ such that vector $\mathbf {x}$ can be recovered when having $\mathbf {r}$ in time $\mathsf {poly}(q) = \mathsf {poly}(d, h, e, \log _2{n}).$

1.3 Our Contributions

Overview: Our objective is to reduce the decoding complexity for identifying up to d defectives and/or up to h inhibitors in the presence of up to e erroneous test outcomes. We present two deterministic schemes that can efficiently solve both Problems 1 and 2 with the probability 1. These schemes use two basic ideas: each column of a $t_1 \times n$ $(d + h, r; z]$-disjunct matrix (defined later) must be generated in time $\mathsf {poly}(t_1)$ and the tensor product (defined later) between it and a special signature matrix. These ideas reduce decoding complexity to $\mathsf {poly}(t_1)$. Moreover, the measurement matrices used in our proposed schemes are nonrandom, i.e., their columns can be nonrandomly generated in time polynomial of the number of rows. As a result, one can save space for storing the measurement matrices. Simulation results confirm our theoretical analysis. When the number of items is sufficiently large, the decoding time in our proposed scheme is smallest in comparison with existing work.

Comparison: We compare our proposed schemes with existing schemes in Table 1. There are six criteria to be considered here. The first one is construction type, which defines how to achieve a measurement matrix. It also affects how defectives and inhibitors are identified. The most common construction type is random; i.e., a measurement matrix is generated randomly. The six schemes evaluated here use random construction except for our proposed schemes.

The second criterion is decoding type: “Deterministic” means the decoding objectives are always achieved with probability 1, while “Randomized” means the decoding objectives are achieved with some high probability. Ganesan et al. [13] used randomized decoding schemes to identify defectives and inhibitors. The schemes in [12] and our proposed schemes use deterministic decoding.

The remaining criteria are: identification of defective items only, identification of both defective items and inhibitor items, error tolerance, the number of tests, and the decoding complexity. The only advantage of the schemes proposed by Ganesan et al. [13] is that the number of tests is less than ours. Our schemes outperformed the existing schemes in other criteria such as error-tolerance, the decoding type, and the decoding complexity. The number of tests with our proposed schemes for identifying defective items only (both defective items and inhibitor items, resp.) is smaller (larger, resp.) than that with the scheme proposed by Chang et al. [12]. The decoding complexity in our proposed scheme is much less than theirs when the number of items is sufficiently large.

Table 1. Comparison with existing schemes. “Deterministic” and “Randomized” are abbreviated as “Det. and “Rnd.”. The $\surd $ sign means that the criterion holds for that scheme, while the $\times $ sign means that it does not. We set $e = \frac{z\,-\,1}{2}$, $\lambda = \frac{(d\,+\,h) \ln {n}}{\mathsf {W}((d\,+\,h)\ln {n})} + z$, and $\alpha = \max \left\{ \frac{\lambda }{(d\,+\,h)^2}, 1 \right\} $, where $\mathsf {W}(x)= \varTheta \left( \ln {x} - \ln {\ln {x}} \right) .$

Full size table

2 Preliminaries

Notation is defined here for consistency. We use capital calligraphic letters for matrices, non-capital letters for scalars, bold letters for vectors, and capital letters for sets. Capital letters with asterisk is denoted for multisets in which elements may appear multiple times. For example, $S = \{1, 2, 3 \} $ is a set and $S^* = \{1, 1, 2, 3 \}$ is a multiset. Here we assume $0 \times (-\infty ) = 0$.

Some frequent notations are listed as follows:

n; d: number of items; maximum number of defective items. For simplicity, we suppose that n is the power of 2.
$|\cdot |$: the weight, i.e., the number of non-zero entries in the input vector or the cardinality of the input set.
$\otimes , \circledcirc $: operator for NAGTI and tensor product, respectively.
[n]: $\{1, 2, \ldots , n \}.$
$\mathcal {S}$: $s \times n$ measurement matrix used to identify at most one defective item or one inhibitor item, where $s = 2\log _2{n}$.
$\mathcal {M}= (m_{ij})$: $m \times n$ disjunct matrix, where integer $m \ge 1$ is number of tests.
$\mathcal {T} = (t_{ij})$: $t \times n$ measurement matrix used to identify at most d defective items, where integer $t \ge 1$ is number of tests.
$\mathbf {x}; \mathbf {y}$: representation of n items; binary representation of the test outcomes.
$\mathcal {S}_j, \mathcal {M}_j, \mathcal {M}_{i,*}$: column j of matrix $\mathcal {S}$, column j of matrix $\mathcal {M}$, and row i of matrix $\mathcal {M}$.
D; H: index set of defective items; index set of inhibitor items.
$\mathsf {supp}(\mathbf {c})$: support set of vector $\mathbf {c} = (c_1, \ldots , c_k)$; i.e., $\mathsf {supp}(\mathbf {c}) = \{j \mid c_j \ne 0 \}$. For example, the support vector for $\mathbf {v} = (1, 0, 0, -\infty )$ is $\mathsf {supp}(\mathbf {v}) = \{1, 4 \}$.
$\mathrm {diag}(\mathcal {M}_{i, *}) = \mathrm {diag}(m_{i1}, m_{i2}, \ldots , m_{in})$: diagonal matrix constructed from input vector $\mathcal {M}_{i, *} = (m_{i1}, m_{i2}, \ldots , m_{in})$.
$\mathrm {e}; \log ; \ln $: base of natural logarithm; logarithm of base 2; natural logarithm.
$\lceil x \rceil ; \lfloor x \rfloor $: ceiling function of x; floor function of x.
$\mathsf {W}(x)$: the Lambert W function in which $\mathsf {W}(x)\mathrm {e}^{\mathsf {W}(x)} = x$.

2.1 Tensor Product

Let $\circledcirc $ be the tensor product notation. Note that the tensor product defined here is not the usual tensor product used in linear algebra. Given an $a \times n$ matrix $\mathcal {A}= (a_{ij})$ and an $s \times n$ matrix $\mathcal {S}= (s_{ij})$, their tensor product is defined as

$$\begin{aligned} \mathcal {R} = \mathcal {A}\circledcirc \mathcal {S}:= \begin{bmatrix} \mathcal {S}\times \mathrm {diag}(\mathcal {A}_{1, *}) \\ \vdots \\ \mathcal {S}\times \mathrm {diag}(\mathcal {A}_{f, *}) \end{bmatrix} = \begin{bmatrix} a_{11} \mathcal {S}_1&\ldots&a_{1n} \mathcal {S}_n \\ \vdots&\ddots&\vdots \\ a_{a1} \mathcal {S}_1&\ldots&a_{an} \mathcal {S}_n \end{bmatrix}, \end{aligned}$$

(3)

where $\mathrm {diag}(.)$ is the diagonal matrix constructed from the input vector, and $\mathcal {A}_{h, *} = (a_{h1}, \ldots , a_{hn})$ is the hth row of $\mathcal {A}$ for $h = 1, \ldots , a$. The size of $\mathcal {R}$ is $r \times n$, where $r = a \times s$.

2.2 Reed-Solomon Codes

Let $n_1, r_1, \varLambda , q$ be positive integers. Let $\varSigma $ be a finite field and $|\varSigma | = q$. From now, we set $\varSigma = \mathbb {F}_q$. Each codeword is considered as a vector of $\mathbb {F}_q^{n_1 \times 1}$. Let C be a subset of $\varSigma ^{n_1}$. Assume that for any $\mathbf {y}\in C$, there exists a message $\mathbf {x}\in \mathbb {F}_q^{r_1}$ such that $\mathbf {y}= \mathcal {G}\mathbf {x}$, where matrix $\mathcal {G}$ is a full-rank $n_1 \times r_1$ matrix in $\mathbb {F}_q$. Then C is called a linear code with minimum distance $\varLambda = \min _{\mathbf {y}\in C} |\mathsf {supp}(\mathbf {y})|$ and denoted as $[n_1, r_1, \varLambda ]_q$. The cardinality of C is $q^{r_1}$. Let $\mathcal {M}_C$ denote the $n_1 \times q^{r_1}$ matrix whose columns are the codewords in C.

An $[n_1, r_1, \varLambda ]_q$-Reed-Solomon (RS) code [16] is an $[n_1, r_1, \varLambda ]_q$ code with $\varLambda = n_1 - r_1 + 1$. Since the parameter $\varLambda $ can be obtained from $n_1$ and $r_1$, we usually refer to an $[n_1, r_1, \varLambda ]_q$-RS code as $[n_1, r_1]_q$-RS code.

2.3 Disjunct Matrix

Superimposed code was introduced by Kautz and Singleton [17] and then generalized by D’yachkov et al. [18] and Stinson and Wei [19]. A superimposed code is defined as follows.

Definition 1

An $m \times n$ binary matrix $\mathcal {M}$ is called an (d, r; z]-superimposed code if for any two disjoint subsets $S_1, S_2 \subset [n]$ such that $|S_1| = d$ and $|S_2| = r$, there exists at least z rows in which there are all 1’s among the columns in $S_2$ while all the columns in $S_1$ have 0’s, i.e., $ \left| \bigcap _{j \in S_2} \mathsf {supp}\left( \mathcal {M}_j \right) \big \backslash \bigcup _{j \in S_1} \mathsf {supp}\left( \mathcal {M}_j \right) \right| \ge z.$

Matrix $\mathcal {M}$ is usually referred to as an (d, r; z]-disjunct matrix. Parameter $e = \lfloor (z-1)/2 \rfloor $ is referred to as the error tolerance of a disjunct matrix. It is clear that for any $d^\prime \le d$, $r^\prime \le r$, and $z^\prime \le z$, an (d, r; z]-disjunct matrix is also an $(d^\prime , r^\prime ; z^\prime ]$-disjunct matrix.

Let $\mathbf {x}= (x_1, \ldots , x_n)^T \in \{0, 1 \}^n$ be the binary representation vector of n items, where $|\mathbf {x}| \le d$. From (2), the outcome vector of m tests by using $\mathcal {M}$ and $\mathbf {x}$ is defined as follows:

$$\begin{aligned} \mathbf {y} = \mathcal {M}\otimes \mathbf {x} = \bigvee _{j = 1}^n x_j \mathcal {M}_j = \bigvee _{j \in D}^n \mathcal {M}_j, \end{aligned}$$

(4)

where $D = \mathsf {supp}(\mathbf {x}) = \{j \mid x_j = 1 \}.$ The procedure to get $\mathbf {y}$ is called encoding procedure. It includes the construction procedure, which is to get a measurement matrix $\mathcal {M}.$ The procedure to recover $\mathbf {x}$ from $\mathbf {y}$ and $\mathcal {M}$ is called decoding procedure. Our objective is to recover $\mathbf {x}$ when the outcome vector $\mathbf {y}$ and the matrix $\mathcal {M}$ are given.

The number of rows in an $m \times n$ (d, r; z]-disjunct matrix is usually exponential to d [15, 20]. Cheraghchi [21] proposed a nonrandom construction for (d, r; z]-disjunct matrices in which the number of tests is larger than the existing works as d or r increases.

Theorem 1

(Lemma 29 [21]). For any positive integers d, r, z and n with $d + r \le n$, there exists an $m \times n$ nonrandom (d, r; z]-disjunct matrix where $m = O \left( (rd \ln {n} + z)^{r + 1} \right) $. Moreover, each column of the matrix can be generated in time $\mathsf {poly}(m).$

An (d, r; z]-disjunct matrix is called an (d; z]-disjunct matrix when $r = 1$, and a d-disjunct matrix when $r = z = 1$. For efficient decoding in the NAGTI model, we pay attention only to an $m \times n$ binary (d, r; z]-disjunct matrix in which each column can be generated in time $\mathsf {poly}(m)$.

2.4 Bui et al.’s Scheme

In this section, the scheme proposed by Bui et al. [14] is described. Its main contribution is that, given any $m \times n$ $(d - 1)$-disjunct matrix, a bigger $t \times n$ measurement matrix can be generated such that up to d defective items (in a set of n items having only defective and negative items) can be identified in time $O(t) = O (m \log {n} )$, where $t = 2m \log {n}$.

Encoding procedure: Let $\mathcal {S}$ be an $s \times n$ measurement matrix:

$$\begin{aligned} \mathcal {S}:= \begin{bmatrix} \mathbf {b}_1&\mathbf {b}_2&\ldots&\mathbf {b}_n \\ \overline{\mathbf {b}}_1&\overline{\mathbf {b}}_2&\ldots&\overline{\mathbf {b}}_n \end{bmatrix} = \begin{bmatrix} \mathcal {S}_1&\ldots&\mathcal {S}_n \end{bmatrix}, \end{aligned}$$

(5)

where $s = 2\log {n}$, $\mathbf {b}_j$ is the $\log {n}$-bit binary representation of integer $j-1$, $\overline{\mathbf {b}}_j$ is the complement of $\mathbf {b}_j$, and $\mathcal {S}_j := \begin{bmatrix} \mathbf {b}_j \\ \overline{\mathbf {b}}_j \end{bmatrix}$ for $j = 1,2,\ldots , n$. Item j is characterized by column $\mathcal {S}_j$ and that the weight of every column in $\mathcal {S}$ is $s/2 = \log {n}.$ Furthermore, the index j is uniquely identified by $\mathbf {b}_j$.

Given an $m \times n$ $(d-1)$-disjunct matrix $\mathcal {M}$, the new measurement $t \times n$ matrix is constructed as follows:

$$\begin{aligned} \mathcal {T}= \mathcal {M}\circledcirc \mathcal {S}, \end{aligned}$$

(6)

where $\circledcirc $ is the tensor product defined in Sect. 2.1 and $t = ms$. For any binary input vector $\mathbf {x}$, its outcome using measurement matrix $\mathcal {T}$ is

$$\begin{aligned} \mathbf {y}= \mathcal {T}\otimes \mathbf {x}= \begin{bmatrix} \mathbf {y}_1 \\ \vdots \\ \mathbf {y}_m \end{bmatrix}, \end{aligned}$$

(7)

where $\mathbf {y}_i = \left( \mathcal {S}\times \mathrm {diag}(\mathcal {M}_{i, *}) \right) \otimes \mathbf {x}= \bigvee _{j = 1}^n x_j m_{ij} \mathcal {S}_j$ for $i = 1, \ldots , m$.

Decoding Procedure: The decoding procedure is quite simple. We can scan all $\mathbf {y}_i$ for $i = 1, \ldots , m$. If $\mathrm {wt}(\mathbf {y}_i) = \log {n}$, the defective item can be identified by calculating the first half of $\mathbf {y}_i$. Otherwise, no defective item is identified. The procedure is described in Algorithm 1.

This scheme can be summarized as the following theorem:

Theorem 2

Let an $m \times n$ matrix $\mathcal {M}$ be $(d - 1)$-disjunct. Suppose that a set of n items has up to d defective and no inhibitors. Then there exists a $t \times n$ matrix $\mathcal {T}$ constructed from $\mathcal {M}$ that can be used to identify up to d defective items in time $t = m \times 2\log {n}$. Further, suppose that each column of $\mathcal {M}$ can be computed in time $\beta $. Then every column of $\mathcal {T}$ can be computed in time $2\log {n} \times \beta = O(\beta \log {n}).$

Algorithm 1 is modified and denoted as $\mathrm {GetDefectives}^*(\mathbf {y}, n)$ if we substitute S by multiset $S^*$; i.e., the output of $\mathrm {GetDefectives}^*(\cdot )$ may have duplicated items which are used to handle the presence of erroneous outcomes in Sects. 4 and 5. Line 7 is interpreted as “Add $d_0$ to set $S^*$”.

3 Improved Instantiation of Nonrandom (d, r; z]-Disjunct Matrices

We first state the useful nonrandom construction of (d, r; z]-disjunct matrices, which is an instance of Theorem 1:

Theorem 3

(Lemma 29 [21]). Let $1 \le d, r, z < n$ be integers and C be a $[n_1 = q-1, k_1]_q$-RS code. For any $d < \frac{n_1 - z}{r(k_1 - 1)} = \frac{q - 1 - z}{r(k_1 - 1)}$ and $n \le q^{k_1}$, there exists a $t \times n$ nonrandom (d, r; z]-disjunct matrix where $t = O \left( q^{r + 1} \right) $. Moreover, each column of the matrix can be constructed in time $O \left( q^{r + 2}/(r^2 d^2) \right) $.

An approximation of a Lambert W function $\mathsf {W}(x)$ [22] is $\ln {x} - \ln {\ln {x}} \le \mathsf {W}(x)\le \ln {x} - \frac{1}{2} \ln {\ln {x}}$ for any $x \ge \mathrm {e}$. Then an improved instantiation of nonrandom (d, r; z]-disjunct matrix is stated as follows:

Corollary 1

For any positive integers d, r, z, and n with $d + r \le n$, there exists a $t \times n$ nonrandom (d, r; z]-disjunct matrix with $t = \varTheta \left( \lambda ^{r + 1} \right) $, where $\lambda = (rd \ln {n})/(\mathsf {W}(d\ln {n})) + z$. Moreover, each column of the matrix can be constructed in time $O \left( \lambda ^{r + 2}/(r^2 d^2) \right) .$

Proof

From Theorem 3, we only need to find an $[n_1 = q-1, k_1]_q$-RS code such that $d < \frac{n_1 - z}{r(k_1 - 1)} = \frac{q - 1 - z}{r(k_1 - 1)}$ and $q^{k_1} \ge n.$ One chooses

$$\begin{aligned} q = {\left\{ \begin{array}{ll} \frac{rd \ln {n}}{\mathsf {W}(d\ln {n})} + z + 1 &{} {\text {if}}\, \frac{rd \ln {n}}{\mathsf {W}(d\ln {n})} + z + 1\, {\text {is the power of 2.}} \\ 2^{\eta + 1}, &{}{\text {otherwise.}} \end{array}\right. } \end{aligned}$$

(8)

where $\eta $ is an integer satisfying $2^\eta< \frac{rd \ln {n}}{\mathsf {W}(d\ln {n})} + z + 1 < 2^{\eta + 1}$. We have $q = \varTheta \left( \frac{rd \ln {n}}{\mathsf {W}(d\ln {n})} + z \right) $ in both cases because $ \frac{rd \ln {n}}{\mathsf {W}(d\ln {n})} + z + 1 \le q < 2 \left( \frac{rd \ln {n}}{\mathsf {W}(d\ln {n})} + z + 1 \right) .$

Set $k_1 = \left\lceil \frac{q - z - 1}{rd} \right\rceil \ge \frac{\ln {n}}{\mathsf {W}(d\ln {n})}$. Note that the condition on d in Theorem 3 always holds because:

$$\begin{aligned} k_1 = \left\lceil \frac{q - z - 1}{rd} \right\rceil \Longrightarrow k_1&< \frac{q - z - 1}{rd} + 1 \Longrightarrow d < \frac{q - 1 - z}{r (k_1 - 1)} = \frac{n_1 - z}{r (k_1 - 1)}. \end{aligned}$$

Finally, our task is to prove that $n \le q^{k_1}$. Indeed, we have:

$$\begin{aligned} q^{k_1} \ge \left( \frac{rd \ln {n}}{\mathsf {W}(d\ln {n})} + z + 1 \right) ^{\frac{\ln {n}}{\mathsf {W}(d\ln {n})} }&\ge \left( \frac{d \ln {n}}{\mathsf {W}(d\ln {n})} \right) ^{\frac{\ln {n}}{\mathsf {W}(d\ln {n})}} \\&\ge \left( \mathrm {e}^{\mathsf {W}(d\ln {n}) \mathrm {e}^{\mathsf {W}(d\ln {n}}} \right) ^{1/d} = (\mathrm {e}^{d\ln {n}})^{1/d} = n. \end{aligned}$$

This completes our proof.

The number of tests in our construction is better than the one in Theorem 1. Furthermore, there is no decoding scheme associated with matrices in this corollary. However, when $r = z = 1$, the scheme in [14] achieves the same number of tests and has an efficient decoding algorithm.

4 Identification of Defective Items

In this section, we answer Problem 1 that there exists a $t \times n$ measurement matrix such that: it can handle at most e errors in the test outcome; each column can be nonrandomly generated in time $\mathsf {poly}(t)$; and all defective items can be identified in time $\mathsf {poly}(d, h, e, \log {n})$, where there are up to d defective items and up to h inhibitor items in n items. The main idea is to use the modified version of Algorithm 1 to identify all potential defective items. Then a sanitary procedure is proceeded to remove all false defective items.

Theorem 4

Let $1 \le d, h, d\,+\,h \le n$ be integers, z be odd, and $\lambda = \frac{(d + h) \ln {n}}{\mathsf {W}((d + h)\ln {n})} + z$. A set of n items includes up to d defective items and up to h inhibitors. Then there exists a $t \times n$ nonrandom matrix such that up to d defective items can be identified in time $O \left( \frac{\lambda ^5 \log {n}}{(d + h)^2} \right) $ with up to $e = \frac{z-1}{2}$ errors in the test outcomes, where $t = \varTheta \left( \lambda ^2 \log {n} \right) $. Moreover, each column of the matrix can be generated in time $\mathsf {poly}(t)$.

The proof is given in the following sections.

4.1 Encoding Procedure

We set $e = \frac{z-1}{2}$ and $\lambda = \frac{(d\,+\,h) \ln {n}}{\mathsf {W}((d\,+\,h)\ln {n})} + z$. Let an $m \times n$ matrix $\mathcal {M}$ be an $(d + h; z]$-disjunct matrix in Corollary 1 ($r = 1$), where

$$\begin{aligned} m = \varTheta \left( \left( \frac{(d + h) \ln {n}}{\mathsf {W}( (d + h) \ln {n})} + z \right) ^2 \right) = O( \lambda ^2). \end{aligned}$$

Each column in $\mathcal {M}$ can be generated in time $t_1 = O \left( \frac{\lambda ^3}{(d + h)^2} \right) $. Then the final $t \times n$ measurement matrix $\mathcal {T}$ is

$$\begin{aligned} \mathcal {T}= \mathcal {M}\circledcirc \mathcal {S}, \end{aligned}$$

(9)

where the $s \times n$ matrix $\mathcal {S}$ is defined in (5) and $t = m s = \varTheta \left( \lambda ^2 \log {n} \right) $. Then it is easy to see that each column of $\mathcal {T}$ can be generated in time $t_1 \times s = \mathsf {poly}(t)$.

Any input vector $\mathbf {x}= (x_1, \ldots , x_n)^T \in \{0, 1, -\infty \}^n$ contains at most d 1’s and at most h $-\infty $’s as described in Sect. 1.2. Note that D and H are the index sets of the defective items and the inhibitor items, respectively. Then the binary outcome vector using the measurement matrix $\mathcal {T}$ is $\mathbf {y}= \mathcal {T}\otimes \mathbf {x}= \begin{bmatrix} \mathbf {y}_1 \\ \vdots \\ \mathbf {y}_m \end{bmatrix},$ where $\mathbf {y}_i = \left( \mathcal {S}\times \mathrm {diag}(\mathcal {M}_{i, *}) \right) \otimes \mathbf {x}= \begin{bmatrix} y_{(i-1)s + 1} \\ \ldots \\ y_{is} \end{bmatrix},$ and $y_{(i - 1)s + l} = 1$ iff $\sum _{j = 1}^n m_{ij} s_{lj} x_j \ge 1$, and $y_{(i - 1)s + l} = 0$ otherwise, for $i = 1, \ldots , m$, and $l = 1, \ldots , s$. We assume that there are at most e incorrect outcomes in the outcome vector $\mathbf {y}$.

4.2 Decoding Procedure

Given outcome vector $\mathbf {y}= (\mathbf {y}_1, \ldots , \mathbf {y}_m)^T$, we can identify all defective items by using Algorithm 2. Step 1 is to identify all potential defectives and put them in the set $S^*$. Then Steps 3 to 8 are to remove duplicate items in the new potential defective set $S_0.$ After that, Steps 9 to 16 are to remove all false defectives. Finally, Step 17 returns the defective set.

4.3 Correctness of Decoding Procedure

Since matrix $\mathcal {M}$ is an $(d + h; z]$-disjunct matrix, there are at least z rows $i_0$ such that $m_{i_0 j} = 1$ and $m_{i_0 j^\prime } = 0$ for any $j \in D$ and $j^\prime \not \in D \cup H \setminus \{ j \}.$ Since up to $e = (z - 1)/2$ errors may appear in test outcome $\mathbf {y}$, there are at least $e + 1$ vectors $\mathbf {y}_{i_0}$ such that the condition in Step 5 of Algorithm 1 holds. Consequently, each value $j \in D$ appears at least $e + 1$ times. Therefore, Steps 1 to 8 return a set $S_0$ containing all defective items and some false defectives.

Steps 9 to 16 are to remove false defectives. For any index $j \not \in D$, since there are at most $e = (z - 1)/2$ erroneous outcomes, there is at least 1 row $i_0$ such that $t_{i_0 j} = 1$ and $t_{i_0 j^\prime } = 0$ for all $j^\prime \in D \cup H.$ Because item $j \not \in D$, the outcome of that row (test) is negative (0). Therefore, Step 12 is to check whether an item in $S_0$ is non-defective. Finally, Step 17 returns the set of defective items.

4.4 Decoding Complexity

The time to run Step 1 is O(t). Since $|S^*| \le m$, it takes m time to run Steps 3 to 8. Because $|S^*| \le m$, the cardinality of $S_0$ is up to m. The loop at Step 9 runs at most m times. Steps 11 and 12 take time $s \times \frac{m^{1.5}}{(d + h)^2} $ and t, respectively. The total decoding time is:

$$\begin{aligned} O(t) + m + m \times \left( s \times \frac{m^{1.5}}{(d + h)^2} + t \right)&= O \left( \frac{sm^{2.5}}{(d + h)^2} \right) = O \left( \frac{\lambda ^5 \log {n}}{(d + h)^2} \right) . \end{aligned}$$

5 Identification of Defectives and Inhibitors

In this section, we answer Problem 2 that there exists a $v \times n$ measurement matrix such that: it can handle at most e errors in the test outcome; each column can be nonrandomly generated in time $\mathsf {poly}(v)$; and all defective items and inhibitor items can be identified in time $\mathsf {poly}(d, h, e, \log {n})$, where there are up to d defective items and up to h inhibitor items in n items.

Theorem 5

Let $1 \le d, h, d + h \le n$ be integers, z be odd, and $\lambda = \frac{(d + h) \ln {n}}{\mathsf {W}((d + h)\ln {n})} + z.$ A set of n items includes up to d defective items and up to h inhibitors. Then there exists a $v \times n$ nonrandom matrix such that up to d defective items and up to h inhibitor items can be identified in time $O \left( d \lambda ^6 \times \max \left\{ \frac{\lambda }{(d+h)^2}, 1 \right\} \right) $, with up to $e = \frac{z-1}{2}$ errors in the test outcomes, where $v = \varTheta \left( \lambda ^3 \log {n} \right) $. Moreover, each column of the matrix can be generated in time $\mathsf {poly}(v)$.

To detect both up to h inhibitors and d defectives, we have to use two types of matrices: an $(d + h; z]$-disjunct matrix and an $(d + h - 2, 2; z]$-disjunct matrix. The main idea is as follows. We first identify all defective items. Then all potential inhibitors are located by using an $(d + h - 2, 2; z]$-disjunct matrix. The final procedure is to remove all false inhibitor items.

5.1 Identification of an Inhibitor

Let $\underline{\vee }$ be the notation for the union of the column corresponding to the defective item and the column corresponding to the inhibitor item. We suppose that there is an outcome $\mathbf {o}:= (o_1, \ldots , o_s)^T = \mathcal {S}_a \underline{\vee } \mathcal {S}_b$, where the defective item is a and the inhibitor item is b, and that $\mathcal {S}_a$ and $\mathcal {S}_b$ are two columns in the $s \times n$ matrix $\mathcal {S}$ in (5). Note that $o_i = 1$ iff $s_{ia} = 1$ and $s_{ib} = 0$, and $o_i = 0$ otherwise, for $i = 1, \ldots , s.$ Assume that the defective item a is already known. The inhibitor item b is identified as in Algorithm 3.

The correctness of the algorithm is described here. Step 2 initializes the corresponding column of inhibitor b in $\mathcal {S}$. Since column $\mathcal {S}_a$ has exactly s / 2 1’s, Steps 3 to 6 are to obtain s / 2 positions of $\mathcal {S}_b$. Since the first half of $\mathcal {S}_a$ is the complement of its second half, it does not exist two indexes $i_0$ and $i_1$ such that $s_{i_0 a} = s_{i_1 a} = 1$, where $|i_0 - i_1| = \log {n}$. As a result, it does not exist two indexes $i_0$ and $i_1$ such that $s_{i_0 b} = s_{i_1 b} = -1$, where $|i_0 - i_1| = \log {n}$. Moreover, the first half of $\mathcal {S}_b$ is the complement of its second half. Therefore, the remaining s / 2 entries of $\mathcal {S}_b$ can be obtained by using Steps 7 to 11. The index of inhibitor b can be identified by checking the first half of $\mathcal {S}_b$, which is done in Step 12. Finally, Step 13 returns the index of the inhibitor.

It is easy to verify that the decoding complexity of Algorithm 3 is O(s).

Example: Let $\mathcal {S}$ be the matrix in (5), where $n = 8$ and $s = 2 \log {n} = 6$. Given item 1 is the unknown inhibitor and that item 3 is the known defective item, assume that the observed vector is $\mathbf {o}= (0, 1, 0, 0, 0, 0)^T.$ The corresponding column of the defective item is $\mathcal {S}_3$. We set $\mathcal {S}_b = (-1, -1, -1, -1, -1, -1)^T.$ We get $\mathcal {S}_b = (-1, 0, -1, 1, -1, 1)^T$ from Steps 3 to 6 and the complete column $\mathcal {S}_b = (0, 0, 0, 1, 1, 1)^T$ from Steps 7 to 11. Because the first half of $\mathcal {S}_b$ is $(0, 0, 0)^T$, the index of the inhibitor is 1.

5.2 Encoding Procedure

We set $e = \frac{z-1}{2}$ and $\lambda = \frac{(d + h) \ln {n}}{\mathsf {W}((d + h)\ln {n})} + z$. Let an $m \times n$ matrix $\mathcal {M}$ and a $g \times n$ matrix $\mathcal {G}$ be an $(d + h; z]$-disjunct matrix and an $(d + h - 2, 2; z]$-disjunct matrix in Corollary 1, respectively, where

$$\begin{aligned} m&= \varTheta \left( \left( \frac{(d + h) \ln {n}}{\mathsf {W}( (d + h) \ln {n})} + z \right) ^2 \right) = \varTheta \left( \lambda ^2 \right) , \\ g&= \varTheta \left( \left( \frac{(d + h) \ln {n}}{\mathsf {W}( (d + h) \ln {n})} + z \right) ^3 \right) = \varTheta \left( \lambda ^3 \right) . \end{aligned}$$

Each column in $\mathcal {M}$ and $\mathcal {G}$ can be generated in time $t_1$ and $t_2$, respectively, where

$$\begin{aligned} t_1 = O \left( \frac{\lambda ^3}{(d + h)^2} \right) , t_2 = O \left( \frac{\lambda ^4}{(d + h)^2} \right) . \end{aligned}$$

(10)

The final $v \times n$ measurement matrix $\mathcal {V}$ is

$$\begin{aligned} \mathcal {V}= \begin{bmatrix} \mathcal {M}\circledcirc \mathcal {S}\\ \mathcal {G}\circledcirc \mathcal {S}\\ \mathcal {G}\end{bmatrix} = \begin{bmatrix} \mathcal {T}\\ \mathcal {H}\\ \mathcal {G}\end{bmatrix}, \end{aligned}$$

(11)

where $\mathcal {T}= \mathcal {M}\circledcirc \mathcal {S}$ and $\mathcal {H}= \mathcal {G}\circledcirc \mathcal {S}.$ The sizes of matrices $\mathcal {T}$ and $\mathcal {H}$ are $t \times n$ and $h \times n$, respectively. Then we have $t = ms = 2m \log {n}$ and $h = gs = 2g \log {n}$. Note that the matrix $\mathcal {T}$ is the same as the one in (9). The number of tests of the measurement matrix $\mathcal {V}$ is

$$\begin{aligned} v = t + h + g = ms + gs + g = O( (m + g)s) = \varTheta \left( \lambda ^3 \log {n} \right) . \end{aligned}$$

Then it is easy to see that each column of matrix $\mathcal {V}$ can be generated in time $(t_1 + t_2) \times s + t_2 = \mathsf {poly}(v)$.

Any input vector $\mathbf {x}= (x_1, \ldots , x_n)^T \in \{0, 1, -\infty \}^n$ contains at most d 1’s and at most h $-\infty $’s as described in Sect. 1.2. The outcome vector using measurement matrix $\mathcal {T}$, i.e., $\mathbf {y}= \mathcal {T}\otimes \mathbf {x}$, is the same as the one in Sect. 4.1. The binary outcome vector using the measurement matrix $\mathcal {H}$ is

$$\begin{aligned} \mathbf {h}= \mathcal {H}\otimes \mathbf {x}= \begin{bmatrix} \mathbf {h}_1 \\ \vdots \\ \mathbf {h}_{g} \end{bmatrix}, \end{aligned}$$

(12)

where $\mathbf {h}_i = \left( \mathcal {S}\times \mathrm {diag}(\mathcal {G}_{i, *}) \right) \,\otimes \,\mathbf {x}= \begin{bmatrix} h_{(i-1)s + 1} \\ \ldots \\ h_{is} \end{bmatrix}$, $h_{(i - 1)s + l} = 1$ iff $\sum _{j = 1}^n g_{ij} s_{lj} x_j \ge 1$, and $h_{(i - 1)s + l} = 0$ otherwise, for $i = 1, \ldots , g$, and $l = 1, \ldots , s$. Therefore, the outcome vector using the measurement matrix $\mathcal {V}$ in (11) is:

$$\begin{aligned} \mathbf {v}= \mathcal {V}\otimes \mathbf {x}= \begin{bmatrix} \mathcal {T}\\ \mathcal {H}\\ \mathcal {G}\end{bmatrix} \otimes \mathbf {x}= \begin{bmatrix} \mathcal {T}\otimes \mathbf {x}\\ \mathcal {H}\otimes \mathbf {x}\\ \mathcal {G}\otimes \mathbf {x}\end{bmatrix} = \begin{bmatrix} \mathbf {y}\\ \mathbf {h}\\ \mathbf {g}\end{bmatrix}, \end{aligned}$$

(13)

where $\mathbf {y}$ is as same as the one in Sect. 4.1, $\mathbf {h}$ is defined in (12), and $\mathbf {g}= \mathcal {G}\otimes \mathbf {x}= (r_1, \ldots , r_g)^T.$ We assume that $0 \times (-\infty ) = 0$ and there are at most $e = (z-1)/2$ incorrect outcomes in the outcome vector $\mathbf {v}.$

5.3 Decoding Procedure

Given outcome vector $\mathbf {v}$, number of items n, number of tests in matrix $\mathcal {M}$, number of tests in matrix $\mathcal {G}$, maximum number of errors e, and functions to generate matrix $\mathcal {V}$, $\mathcal {G}$, $\mathcal {M}$, and $\mathcal {S}$. The details of the proposed scheme is described in Algorithm 4. Steps 1 to 2 are to divide the outcome vector $\mathbf {v}$ into three smaller vectors $\mathbf {y}, \mathbf {h},$ and $\mathbf {g}$ as (13). Then Step 3 is to get the defective set. All potential inhibitors would be identified in Steps 5 to 12. Then Steps 14 to 23 are to remove most of false inhibitors. Since there may be some duplicate inhibitors and some remaining false inhibitors in the inhibitor set, Step 25 to 31 are to remove the remaining false inhibitors and make each element in the inhibitor set unique. Finally, Step 32 is to return the defective set and the inhibitor set.

5.4 Correctness of the Decoding Procedure

Because of the construction of $\mathcal {V}$, the three vectors split from the outcome vector $\mathbf {v}$ in Step 2 are $\mathbf {y}= \mathcal {T}\otimes \mathbf {x}, \mathbf {h}= \mathcal {H}\otimes \mathbf {x},$ and $\mathbf {g}= \mathcal {G}\otimes \mathbf {x}.$ Therefore, the set D achieved in Step 3 is the defective set as analyzed in Sect. 4.

Let H be the true inhibitor set which we will identify. Since $\mathcal {G}$ is an $(d + h - 2, 2; z]$-disjunct matrix $\mathcal {G}$, for any $j_1 \in H$ (we have not known H yet) and $j_2 \in D$, there exists at least z rows $i_0$’s such that $g_{i_0 j_1} = g_{i_0 j_2} = 1$ and $g_{i_0 j^\prime } = 0$, for all $j^\prime \in D \cup H \setminus \{j_1, j_2 \}.$ Then, since there are at most $e = (z-1)/2$ errors in $\mathbf {v}$, there exists at least $e + 1 = (z-1)/2 + 1$ index $i_0$’s such that $\mathbf {h}_{i_0} = \mathcal {S}_{j_1} \underline{\vee } \mathcal {S}_{j_2}.$ As analyzed in Sect. 5.1, for any vector which is the union of the column corresponding to the defective item and the column corresponding to the inhibitor item, the inhibitor item is always identified if the defective item is known. Therefore, the set $H_0^*$ obtained from Steps 7 to 12 contains all inhibitors and may contain some false inhibitors. Our next goal is to remove false inhibitors.

To remove the false inhibitors, we first remove all defective items in the set $H_0^*$ as Step 16. Therefore, there are only inhibitors and negative items in the set $H_0^*$ after implementing Step 16. One needs to exploit the property of the inhibitor that it will make the test outcome negative if there are at least one inhibitor and at least one defective in the same test. We pick an arbitrary defective item $y \in D$ and generate its corresponding column $\mathcal {G}_y$ in the matrix $\mathcal {G}.$ Since $\mathcal {G}$ is an $(d + h - 2, 2; z]$-disjunct matrix $\mathcal {G}$ and there are at most $e = (z-1)/2$ errors in $\mathbf {v}$, for any $j_1 \in H$ (we have not known H yet) and $y \in D$, there exists at least $z - e = e + 1$ rows $i_0$’s such that $g_{i_0 j_1} = g_{i_0 y} = 1$ and $g_{i_0 j^\prime } = 0$, for all $j^\prime \in D \cup H \setminus \{j_1, y \}.$ The outcome of these tests would be negative. Therefore, Steps 14 to 23 removes most of false inhibitors. Note that since there are at most e errors, the are at most e false inhibitors and each of them appears at most e times in the set $H_0^*.$ Then Step 25 to 31 are to completely remove false inhibitors and make each element in the inhibitor set unique. Finally, Step 32 returns the sets of defective items and inhibitor items.

5.5 Decoding Complexity

First, we find all potential inhibitors. It takes time O(v) for Step 2. The time to get the defective set D is $O \left( \frac{sm^{2.5}}{(d + h)^2} \right) = O\left( \frac{\lambda ^5 \log {n}}{(d+h)^2} \right) $ as analyzed in Theorem 4. Steps 7 and 8 have up to g and $|D| \le d$ loops, respectively. Since Step 9 takes time O(s), the running time from Steps 7 to 12 is O(gds) and the cardinality of $H_0^*$ is up to gd.

Second, we analyze the complexity of removing false inhibitors. Step 15 takes time $t_1$ as in (10). Since $|H_0^*| \le gd$, the number of loops at Step 17 is at most gd. For the next step, it takes time $t_2$ for Step 18 as in (10). And it takes time O(g) from Steps 19 to 22. As a result, it takes time $O(t_1 + gd(t_2 + g))$ for Steps 14 to 23.

Finally, Steps 25 to 31 are to remove duplicate inhibitors in the new defective set H. It takes time O(gd) to do that because we know $|H_0^*| \le gd.$

In summary, the decoding complexity is:

$$\begin{aligned}&O \left( \frac{sm^{2.5}}{(d + h)^2} \right) + O(gds) +O(t_1 + gd \times (t_2 + g)) + O(gd) \\&= O \left( \frac{sm^{2.5}}{(d + h)^2} \right) + O(gd (t_2 + g)) = O\left( \frac{\lambda ^5 \log {n}}{(d+h)^2} \right) + O \left( d \lambda ^3 \times \left( \frac{\lambda ^4}{(d + h)^2} + \lambda ^3 \right) \right) \\&= O \left( d \lambda ^6 \times \max \left\{ \frac{\lambda }{(d+h)^2}, 1 \right\} \right) . \end{aligned}$$

6 Simulation

In this section, we visualize the number of tests and decoding times in Table 1. We evaluated variations of our proposed scheme by simulation using $d = 2, 4, \ldots , 2^{10}$, $h = 0.2d$, and $n = 2^{32}$ in Matlab R2015a on an HP Compaq Pro 8300SF desktop PC with a 3.4-GHz Intel Core i7-3770 processor and 16-GB memory. Two scenarios are considered here: identification of defective items (corresponding to Sect. 4) and identification of defectives and inhibitors (corresponding to Sect. 5). For each scenario, two models of noise are considered in test outcomes: noiseless setting and noisy setting. In the noisy setting, the number of errors is set to be as 100 times the summation of the number of defective items and the number of inhibitor items. Moreover, in some special cases, the number of items and the number of errors may be reconsidered.

All figures are plotted in 3 dimensions in which the x-axis (on the right of figures), y-axis (in the middle of figures), z-axis (the vertical line) represent number of defectives, number of inhibitors, and number of tests. Our proposed scheme, Ganesan et al.’s scheme, and Chang et al.’s scheme are visualized with red color with marker of circle, green color with marker of pentagram, and blue color with marker of asterisk. In the noisy setting, Ganesan et al.’s scheme is not plotted because the authors of that scheme did not consider the noisy setting.

For decoding time, when the number of items is sufficiently large, the decoding time in our proposed scheme is smaller than that of Chang et al.’s scheme and Ganesan et al.’s scheme.

6.1 Identification of Defective Items

We illustrate decoding time when defective items are the only items that we want to recover here. When there are no errors in test outcomes, as shown in Fig. 1, the decoding time in our proposed scheme is lowest. Since the decoding times in our proposed scheme and Ganesan et al.’s scheme are relatively equal, only one line is visible in the left subfigure of Fig. 1. Therefore, we zoomed in on those lines to see how close these two decoding times are. As plotted in the right subfigure of Fig. 1, when the number of defective items and the number of inhibitor items are small, the decoding time in our proposed scheme is always smaller the one in Ganesan et al.’s scheme. As the number of defective items and the number of inhibitor items increase, the decoding time in our proposed scheme first becomes larger the one in Ganesan et al.’s scheme, though it becomes smaller after the number of items reaches some threshold. We note that if the number of defective items and inhibitor items are fixed while the number of total items is sufficiently large, the decoding time in our proposed scheme is always smaller than the ones in Chang et al.’s scheme and Ganesan et al.’s scheme.

When some erroneous outcomes are allowed, the decoding time in our proposed scheme is always smaller than the one in Chang et al.’s scheme as shown in Fig. 2.

6.2 Identification of Defectives and Inhibitors

We illustrate decoding time for classifying all items. In principle, the complexity of the decoding time in our proposed scheme is smallest in comparison with the ones in Chang et al.’s scheme and Ganesan et al.’s scheme when the number of items is sufficiently large. When there are no errors in test outcomes, the decoding time of the proposed scheme is smallest when the number of items is at least $2^{66}$, as shown in subfigure (b) of Fig. 3. When some erroneous outcomes are allowed, the decoding time in our proposed scheme is always smaller than the one in Chang et al.’s scheme when the number of items is at least $2^{61}$, as shown in subfigure (b) of Fig. 4.

7 Conclusion

We have presented two schemes efficiently identifying up to d defective items and up to h inhibitors in the presence of e erroneous outcomes in time $\mathsf {poly}(d, h, e, \log {n})$. This decoding complexity is substantially less than that of state-of-the-art systems in which the decoding complexity is $\mathsf {poly}(d, h, e, n)$. However, the number of tests with our proposed schemes is slightly higher. Moreover, we have not considered an inhibitor complex model [12] in which each inhibitor in this work would be transferred to a bundle of inhibitors. Such a model would be much more complicated and is left for future work.

References

Dorfman, R.: The detection of defective members of large populations. Ann. Math. Stat. 14(4), 436–440 (1943)
Article Google Scholar
Du, D., Hwang, F.K., Hwang, F.: Combinatorial Group Testing and Its Applications, vol. 12. World Scientific (2000)
Google Scholar
D’yachkov, A.G., Rykov, V.V.: Bounds on the length of disjunctive codes. Problemy Peredachi Informatsii 18(3), 7–13 (1982)
Google Scholar
Ngo, H.Q., Du, D.-Z.: A survey on combinatorial group testing algorithms with applications to DNA library screening. Discret. Math. Probl. Med. Appl. 55, 171–182 (2000)
MathSciNet MATH Google Scholar
D’yachkov, A., Polyanskii, N., Shchukin, V., Vorobyev, I.: Separable codes for the symmetric multiple-access channel. In: IEEE ISIT, pp. 291–295. IEEE (2018)
Google Scholar
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM TODS 30(1), 249–278 (2005)
Article Google Scholar
Bui, T.V., Kuribayashi, M., Cheraghchi, M., Echizen, I.: A framework for generalized group testing with inhibitors and its potential application in neuroscience. arXiv preprint arXiv:1810.01086 (2018)
Farach, M., Kannan, S., Knill, E., Muthukrishnan, S.: Group testing problems with sequences in experimental molecular biology. In: Compression and Complexity of Sequences 1997, Proceedings, pp. 357–367. IEEE (1997)
Google Scholar
De Bonis, A., Vaccaro, U.: Improved algorithms for group testing with inhibitors. Inf. Process. Lett. 67(2), 57–64 (1998)
Article MathSciNet Google Scholar
De Bonis, A., Gasieniec, L., Vaccaro, U.: Optimal two-stage algorithms for group testing problems. SIAM J. Comput. 34(5), 1253–1270 (2005)
Article MathSciNet Google Scholar
Hwang, F.K., Liu, Y.: Error-tolerant pooling designs with inhibitors. J. Comput. Biol. 10(2), 231–236 (2003)
Article Google Scholar
Chang, H., Chen, H.-B., Fu, H.-L.: Identification and classification problems on pooling designs for inhibitor models. J. Comput. Biol. 17(7), 927–941 (2010)
Article MathSciNet Google Scholar
Ganesan, A., Jaggi, S., Saligrama, V.: Non-adaptive group testing with inhibitors. In: ITW, pp. 1–5. IEEE (2015)
Google Scholar
Bui, T.V., Kojima, T., Kuribayashi, M., Haghvirdinezhad, R., Echizen, I.: Efficient (nonrandom) construction and decoding for non-adaptive group testing. arXiv preprint arXiv:1804.03819. J. Inf. Process. (to appear)
Bui, T.V., Kuribayashil, M., Cheraghchi, M., Echizen, I.: Efficiently decodable non-adaptive threshold group testing. In: ISIT, pp. 2584–2588. IEEE (2018)
Google Scholar
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. JSIAM 8(2), 300–304 (1960)
MathSciNet MATH Google Scholar
Kautz, W., Singleton, R.: Nonrandom binary superimposed codes. IEEE Trans. Inf. Theory 10(4), 363–377 (1964)
Article Google Scholar
D’yachkov, A., Vilenkin, P., Torney, D., Macula, A.: Families of finite sets in which no intersection of $\ell $ sets is covered by the union of $s$ others. J. Combin. Theory Ser. A 99(2), 195–218 (2002)
Google Scholar
Stinson, D.R., Wei, R.: Generalized cover-free families. Discret. Math. 279(1–3), 463–477 (2004)
Article MathSciNet Google Scholar
Chen, H.-B., Fu, H.-L., Hwang, F.K.: An upper bound of the number of tests in pooling designs for the error-tolerant complex model. Optim. Lett. 2(3), 425–431 (2008)
Article MathSciNet Google Scholar
Cheraghchi, M.: Improved constructions for non-adaptive threshold group testing. Algorithmica 67(3), 384–417 (2013)
Article MathSciNet Google Scholar
Hoorfar, A., Hassani, M.: Inequalities on the lambert w function and hyperpower function. J. Inequal. Pure Appl. Math 9(2), 5–9 (2008)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

SOKENDAI (The Graduate University for Advanced Studies), Hayama, Kanagawa, Japan
Thach V. Bui & Isao Echizen
National Institute of Informatics, Tokyo, Japan
Isao Echizen
Okayama University, Okayama, Japan
Minoru Kuribayashi
National Institute of Technology, Tokyo College, Hachioji, Japan
Tetsuya Kojima

Authors

Thach V. Bui
View author publications
You can also search for this author in PubMed Google Scholar
Minoru Kuribayashi
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Kojima
View author publications
You can also search for this author in PubMed Google Scholar
Isao Echizen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thach V. Bui .

Editor information

Editors and Affiliations

Anna University, Chennai, India
T.V. Gopal
Waseda University, Kitakyushu, Japan
Junzo Watada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, T.V., Kuribayashi, M., Kojima, T., Echizen, I. (2019). Sublinear Decoding Schemes for Non-adaptive Group Testing with Inhibitors. In: Gopal, T., Watada, J. (eds) Theory and Applications of Models of Computation. TAMC 2019. Lecture Notes in Computer Science(), vol 11436. Springer, Cham. https://doi.org/10.1007/978-3-030-14812-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-14812-6_7
Published: 06 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14811-9
Online ISBN: 978-3-030-14812-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics