8.5 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Group testing | 4/10 | https://en.wikipedia.org/wiki/Group_testing | reference | science, encyclopedia | 2026-05-05T09:50:23.496143+00:00 | kb-cron |
x
{\displaystyle \mathbf {x} }
is intended to describe the (unknown) set of defective items. The key property of
x
{\displaystyle \mathbf {x} }
is that it is an implicit input. That is to say, there is no direct knowledge of what the entries of
x
{\displaystyle \mathbf {x} }
are, other than that which can be inferred via some series of 'tests'. This leads on to the next definition.
Let
x
{\displaystyle \mathbf {x} }
be an input vector. A set,
S
⊆
{
1
,
2
,
…
,
n
}
{\displaystyle S\subseteq \{1,2,\dots ,n\}}
is called a test. When testing is noiseless, the result of a test is positive when there exists
j
∈
S
{\displaystyle j\in S}
such that
x
j
=
1
{\displaystyle x_{j}=1}
, and the result is negative otherwise. Therefore, the goal of group testing is to come up with a method for choosing a 'short' series of tests that allow
x
{\displaystyle \mathbf {x} }
to be determined, either exactly or with a high degree of certainty.
A group-testing algorithm is said to make an error if it incorrectly labels an item (that is, labels any defective item as non-defective or vice versa). This is not the same thing as the result of a group test being incorrect. An algorithm is called zero-error if the probability that it makes an error is zero.
t
(
d
,
n
)
{\displaystyle t(d,n)}
denotes the minimum number of tests required to always find
d
{\displaystyle d}
defectives among
n
{\displaystyle n}
items with zero probability of error by any group-testing algorithm. For the same quantity but with the restriction that the algorithm is non-adaptive, the notation
t
¯
(
d
,
n
)
{\displaystyle {\bar {t}}(d,n)}
is used.
=== General bounds === Since it is always possible to resort to individual testing by setting
S
j
=
{
j
}
{\displaystyle S_{j}=\{j\}}
for each
1
≤
j
≤
n
{\displaystyle 1\leq j\leq n}
, it must be that that
t
¯
(
d
,
n
)
≤
n
{\displaystyle {\bar {t}}(d,n)\leq n}
. Also, since any non-adaptive testing procedure can be written as an adaptive algorithm by simply performing all the tests without regard to their outcome,
t
(
d
,
n
)
≤
t
¯
(
d
,
n
)
{\displaystyle t(d,n)\leq {\bar {t}}(d,n)}
. Finally, when
0
≠
d
≠
n
{\displaystyle 0\neq d\neq n}
, there is at least one item whose defectiveness must be determined (by at least one test), and so
1
≤
t
(
d
,
n
)
{\displaystyle 1\leq t(d,n)}
. In summary (when assuming
0
≠
d
≠
n
{\displaystyle 0\neq d\neq n}
),
1
≤
t
(
d
,
n
)
≤
t
¯
(
d
,
n
)
≤
n
{\displaystyle 1\leq t(d,n)\leq {\bar {t}}(d,n)\leq n}
.
==== Information lower bound ==== A lower bound on the number of tests needed can be described using the notion of sample space, denoted
S
{\displaystyle {\mathcal {S}}}
, which is simply the set of possible placements of defectives. For any group testing problem with sample space
S
{\displaystyle {\mathcal {S}}}
and any group-testing algorithm, it can be shown that
t
≥
⌈
log
2
|
S
|
⌉
{\displaystyle t\geq \lceil \log _{2}{|{\mathcal {S}}|}\rceil }
, where
t
{\displaystyle t}
is the minimum number of tests required to identify all defectives with a zero probability of error. This is called the information lower bound. This bound is derived from the fact that after each test,
S
{\displaystyle {\mathcal {S}}}
is split into two disjoint subsets, each corresponding to one of the two possible outcomes of the test. However, the information lower bound itself is usually unachievable, even for small problems. This is because the splitting of
S
{\displaystyle {\mathcal {S}}}
is not arbitrary, since it must be realisable by some test. In fact, the information lower bound can be generalised to the case where there is a non-zero probability that the algorithm makes an error. In this form, the theorem gives us an upper bound on the probability of success based on the number of tests. For any group-testing algorithm that performs
t
{\displaystyle t}
tests, the probability of success,
P
(
success
)
{\displaystyle \mathbb {P} ({\textrm {success}})}
, satisfies
P
(
success
)
≤
t
/
log
2
(
n
d
)
{\displaystyle \mathbb {P} ({\textrm {success}})\leq t/\log _{2}{n \choose d}}
. This can be strengthened to:
P
(
success
)
≤
2
t
(
n
d
)
{\displaystyle \mathbb {P} ({\textrm {success}})\leq {\frac {2^{t}}{n \choose d}}}
.
=== Representation of non-adaptive algorithms ===
Algorithms for non-adaptive group testing consist of two distinct phases. First, it is decided how many tests to perform and which items to include in each test. In the second phase, often called the decoding step, the results of each group test are analysed to determine which items are likely to be defective. The first phase is usually encoded in a matrix as follows.