kb/data/en.wikipedia.org/wiki/Group_testing-4.md

856 lines
11 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Group testing"
chunk: 5/10
source: "https://en.wikipedia.org/wiki/Group_testing"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:23.496143+00:00"
instance: "kb-cron"
---
Suppose a non-adaptive group testing procedure for
n
{\displaystyle n}
items consists of the tests
S
1
,
S
2
,
,
S
t
{\displaystyle S_{1},S_{2},\dots ,S_{t}}
for some
t
N
0
{\displaystyle t\in \mathbb {N} _{\geq 0}}
. The testing matrix for this scheme is the
t
×
n
{\displaystyle t\times n}
binary matrix,
M
{\displaystyle M}
, where
(
M
)
i
j
=
1
{\displaystyle (M)_{ij}=1}
if and only if
j
S
i
{\displaystyle j\in S_{i}}
(and is zero otherwise).
Thus each column of
M
{\displaystyle M}
represents an item and each row represents a test, with a
1
{\displaystyle 1}
in the
(
i
,
j
)
-th
{\displaystyle (i,j){\textrm {-th}}}
entry indicating that the
i
-th
{\displaystyle i{\textrm {-th}}}
test included the
j
-th
{\displaystyle j{\textrm {-th}}}
item and a
0
{\displaystyle 0}
indicating otherwise.
As well as the vector
x
{\displaystyle \mathbf {x} }
(of length
n
{\displaystyle n}
) that describes the unknown defective set, it is common to introduce the result vector, which describes the results of each test.
Let
t
{\displaystyle t}
be the number of tests performed by a non-adaptive algorithm. The result vector,
y
=
(
y
1
,
y
2
,
,
y
t
)
{\displaystyle \mathbf {y} =(y_{1},y_{2},\dots ,y_{t})}
, is a binary vector of length
t
{\displaystyle t}
(that is,
y
{
0
,
1
}
t
{\displaystyle \mathbf {y} \in \{0,1\}^{t}}
) such that
y
i
=
1
{\displaystyle y_{i}=1}
if and only if the result of the
i
-th
{\displaystyle i{\textrm {-th}}}
test was positive (i.e. contained at least one defective).
With these definitions, the non-adaptive problem can be reframed as follows: first a testing matrix is chosen,
M
{\displaystyle M}
, after which the vector
y
{\displaystyle \mathbf {y} }
is returned. Then the problem is to analyse
y
{\displaystyle \mathbf {y} }
to find some estimate for
x
{\displaystyle \mathbf {x} }
.
In the simplest noisy case, where there is a constant probability,
q
{\displaystyle q}
, that a group test will have an erroneous result, one considers a random binary vector,
v
{\displaystyle \mathbf {v} }
, where each entry has a probability
q
{\displaystyle q}
of being
1
{\displaystyle 1}
, and is
0
{\displaystyle 0}
otherwise. The vector that is returned is then
y
^
=
y
+
v
{\displaystyle {\hat {\mathbf {y} }}=\mathbf {y} +\mathbf {v} }
, with the usual addition on
(
Z
/
2
Z
)
n
{\displaystyle (\mathbb {Z} /2\mathbb {Z} )^{n}}
(equivalently this is the element-wise XOR operation). A noisy algorithm must estimate
x
{\displaystyle \mathbf {x} }
using
y
^
{\displaystyle {\hat {\mathbf {y} }}}
(that is, without direct knowledge of
y
{\displaystyle \mathbf {y} }
).
=== Bounds for non-adaptive algorithms ===
The matrix representation makes it possible to prove some bounds on non-adaptive group testing. The approach mirrors that of many deterministic designs, where
d
{\displaystyle d}
-separable matrices are considered, as defined below.
A binary matrix,
M
{\displaystyle M}
, is called
d
{\displaystyle d}
-separable if every Boolean sum (logical OR) of any
d
{\displaystyle d}
of its columns is distinct. Additionally, the notation
d
¯
{\displaystyle {\bar {d}}}
-separable indicates that every sum of any of up to
d
{\displaystyle d}
of
M
{\displaystyle M}
's columns is distinct. (This is not the same as
M
{\displaystyle M}
being
k
{\displaystyle k}
-separable for every
k
d
{\displaystyle k\leq d}
.)
When
M
{\displaystyle M}
is a testing matrix, the property of being
d
{\displaystyle d}
-separable (
d
¯
{\displaystyle {\bar {d}}}
-separable) is equivalent to being able to distinguish between (up to)
d
{\displaystyle d}
defectives. However, it does not guarantee that this will be straightforward. A stronger property, called disjunctness does.
A binary matrix,
M
{\displaystyle M}
is called
d
{\displaystyle d}
-disjunct if the Boolean sum of any
d
{\displaystyle d}
columns does not contain any other column. (In this context, a column A is said to contain a column B if for every index where B has a 1, A also has a 1.)
A useful property of
d
{\displaystyle d}
-disjunct testing matrices is that, with up to
d
{\displaystyle d}
defectives, every non-defective item will appear in at least one test whose outcome is negative. This means there is a simple procedure for finding the defectives: just remove every item that appears in a negative test.
Using the properties of
d
{\displaystyle d}
-separable and
d
{\displaystyle d}
-disjunct matrices the following can be shown for the problem of identifying
d
{\displaystyle d}
defectives among
n
{\displaystyle n}
total items.
The number of tests needed for an asymptotically small average probability of error scales as
O
(
d
log
2
n
)
{\displaystyle O(d\log _{2}n)}
.
The number of tests needed for an asymptotically small maximum probability of error scales as
O
(
d
2
log
2
n
)
{\displaystyle O(d^{2}\log _{2}n)}
.
The number of tests needed for a zero probability of error scales as
O
(
d
2
log
2
n
log
2
d
)
{\displaystyle O\left({\frac {d^{2}\log _{2}n}{\log _{2}d}}\right)}
.
== Generalised binary-splitting algorithm ==
The generalised binary-splitting algorithm is an essentially-optimal adaptive group-testing algorithm that finds
d
{\displaystyle d}
or fewer defectives among
n
{\displaystyle n}
items as follows: