11 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Group testing | 5/10 | https://en.wikipedia.org/wiki/Group_testing | reference | science, encyclopedia | 2026-05-05T09:50:23.496143+00:00 | kb-cron |
Suppose a non-adaptive group testing procedure for
n
{\displaystyle n}
items consists of the tests
S
1
,
S
2
,
…
,
S
t
{\displaystyle S_{1},S_{2},\dots ,S_{t}}
for some
t
∈
N
≥
0
{\displaystyle t\in \mathbb {N} _{\geq 0}}
. The testing matrix for this scheme is the
t
×
n
{\displaystyle t\times n}
binary matrix,
M
{\displaystyle M}
, where
(
M
)
i
j
=
1
{\displaystyle (M)_{ij}=1}
if and only if
j
∈
S
i
{\displaystyle j\in S_{i}}
(and is zero otherwise). Thus each column of
M
{\displaystyle M}
represents an item and each row represents a test, with a
1
{\displaystyle 1}
in the
(
i
,
j
)
-th
{\displaystyle (i,j){\textrm {-th}}}
entry indicating that the
i
-th
{\displaystyle i{\textrm {-th}}}
test included the
j
-th
{\displaystyle j{\textrm {-th}}}
item and a
0
{\displaystyle 0}
indicating otherwise. As well as the vector
x
{\displaystyle \mathbf {x} }
(of length
n
{\displaystyle n}
) that describes the unknown defective set, it is common to introduce the result vector, which describes the results of each test.
Let
t
{\displaystyle t}
be the number of tests performed by a non-adaptive algorithm. The result vector,
y
=
(
y
1
,
y
2
,
…
,
y
t
)
{\displaystyle \mathbf {y} =(y_{1},y_{2},\dots ,y_{t})}
, is a binary vector of length
t
{\displaystyle t}
(that is,
y
∈
{
0
,
1
}
t
{\displaystyle \mathbf {y} \in \{0,1\}^{t}}
) such that
y
i
=
1
{\displaystyle y_{i}=1}
if and only if the result of the
i
-th
{\displaystyle i{\textrm {-th}}}
test was positive (i.e. contained at least one defective). With these definitions, the non-adaptive problem can be reframed as follows: first a testing matrix is chosen,
M
{\displaystyle M}
, after which the vector
y
{\displaystyle \mathbf {y} }
is returned. Then the problem is to analyse
y
{\displaystyle \mathbf {y} }
to find some estimate for
x
{\displaystyle \mathbf {x} }
. In the simplest noisy case, where there is a constant probability,
q
{\displaystyle q}
, that a group test will have an erroneous result, one considers a random binary vector,
v
{\displaystyle \mathbf {v} }
, where each entry has a probability
q
{\displaystyle q}
of being
1
{\displaystyle 1}
, and is
0
{\displaystyle 0}
otherwise. The vector that is returned is then
y
^
=
y
+
v
{\displaystyle {\hat {\mathbf {y} }}=\mathbf {y} +\mathbf {v} }
, with the usual addition on
(
Z
/
2
Z
)
n
{\displaystyle (\mathbb {Z} /2\mathbb {Z} )^{n}}
(equivalently this is the element-wise XOR operation). A noisy algorithm must estimate
x
{\displaystyle \mathbf {x} }
using
y
^
{\displaystyle {\hat {\mathbf {y} }}}
(that is, without direct knowledge of
y
{\displaystyle \mathbf {y} }
).
=== Bounds for non-adaptive algorithms === The matrix representation makes it possible to prove some bounds on non-adaptive group testing. The approach mirrors that of many deterministic designs, where
d
{\displaystyle d}
-separable matrices are considered, as defined below.
A binary matrix,
M
{\displaystyle M}
, is called
d
{\displaystyle d}
-separable if every Boolean sum (logical OR) of any
d
{\displaystyle d}
of its columns is distinct. Additionally, the notation
d
¯
{\displaystyle {\bar {d}}}
-separable indicates that every sum of any of up to
d
{\displaystyle d}
of
M
{\displaystyle M}
's columns is distinct. (This is not the same as
M
{\displaystyle M}
being
k
{\displaystyle k}
-separable for every
k
≤
d
{\displaystyle k\leq d}
.) When
M
{\displaystyle M}
is a testing matrix, the property of being
d
{\displaystyle d}
-separable (
d
¯
{\displaystyle {\bar {d}}}
-separable) is equivalent to being able to distinguish between (up to)
d
{\displaystyle d}
defectives. However, it does not guarantee that this will be straightforward. A stronger property, called disjunctness does.
A binary matrix,
M
{\displaystyle M}
is called
d
{\displaystyle d}
-disjunct if the Boolean sum of any
d
{\displaystyle d}
columns does not contain any other column. (In this context, a column A is said to contain a column B if for every index where B has a 1, A also has a 1.) A useful property of
d
{\displaystyle d}
-disjunct testing matrices is that, with up to
d
{\displaystyle d}
defectives, every non-defective item will appear in at least one test whose outcome is negative. This means there is a simple procedure for finding the defectives: just remove every item that appears in a negative test. Using the properties of
d
{\displaystyle d}
-separable and
d
{\displaystyle d}
-disjunct matrices the following can be shown for the problem of identifying
d
{\displaystyle d}
defectives among
n
{\displaystyle n}
total items.
The number of tests needed for an asymptotically small average probability of error scales as
O
(
d
log
2
n
)
{\displaystyle O(d\log _{2}n)}
. The number of tests needed for an asymptotically small maximum probability of error scales as
O
(
d
2
log
2
n
)
{\displaystyle O(d^{2}\log _{2}n)}
. The number of tests needed for a zero probability of error scales as
O
(
d
2
log
2
n
log
2
d
)
{\displaystyle O\left({\frac {d^{2}\log _{2}n}{\log _{2}d}}\right)}
.
== Generalised binary-splitting algorithm ==
The generalised binary-splitting algorithm is an essentially-optimal adaptive group-testing algorithm that finds
d
{\displaystyle d}
or fewer defectives among
n
{\displaystyle n}
items as follows: