9.9 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Group testing | 6/10 | https://en.wikipedia.org/wiki/Group_testing | reference | science, encyclopedia | 2026-05-05T09:50:23.496143+00:00 | kb-cron |
If
n
≤
2
d
−
2
{\displaystyle n\leq 2d-2}
, test the
n
{\displaystyle n}
items individually. Otherwise, set
l
=
n
−
d
+
1
{\displaystyle l=n-d+1}
and
α
=
⌊
log
2
l
/
d
⌋
{\displaystyle \alpha =\lfloor \log _{2}{l/d}\rfloor }
. Test a group of size
2
α
{\displaystyle 2^{\alpha }}
. If the outcome is negative, every item in the group is declared to be non-defective; set
n
:=
n
−
2
α
{\displaystyle n:=n-2^{\alpha }}
and go to step 1. Otherwise, use a binary search to identify one defective and an unspecified number, called
x
{\displaystyle x}
, of non-defective items; set
n
:=
n
−
1
−
x
{\displaystyle n:=n-1-x}
and
d
:=
d
−
1
{\displaystyle d:=d-1}
. Go to step 1. The generalised binary-splitting algorithm requires no more than
T
{\displaystyle T}
tests where
T
=
{
n
n
≤
2
d
−
2
(
α
+
2
)
d
+
p
−
1
n
≥
2
d
−
1
{\displaystyle T={\begin{cases}n&n\leq 2d-2\\(\alpha +2)d+p-1&n\geq 2d-1\end{cases}}}
. For
n
/
d
{\displaystyle n/d}
large, it can be shown that
T
→
d
log
2
(
n
/
d
)
{\displaystyle T\rightarrow d\log _{2}(n/d)}
, which compares favorably to the
t
=
e
log
2
e
d
log
2
(
n
d
)
{\displaystyle t={\frac {e}{\log _{2}e}}d\log _{2}\left({\frac {n}{d}}\right)}
tests required for Li's
s
{\displaystyle s}
-stage algorithm. In fact, the generalised binary-splitting algorithm is close to optimal in the following sense. When
d
≥
2
{\displaystyle d\geq 2}
it can be shown that
T
−
B
I
(
d
,
n
)
≤
(
d
−
1
)
{\displaystyle T-B_{I}(d,n)\leq (d-1)}
, where
B
I
(
d
,
n
)
=
⌈
log
2
∑
i
=
0
d
(
n
i
)
⌉
{\displaystyle B_{I}(d,n)=\left\lceil \log _{2}\sum _{i=0}^{d}{n \choose i}\right\rceil }
is the information lower bound.
== Non-adaptive algorithms == Non-adaptive group-testing algorithms tend to assume that the number of defectives, or at least a good upper bound on them, is known. This quantity is denoted
d
{\displaystyle d}
in this section. If no bounds are known, there are non-adaptive algorithms with low query complexity that can help estimate
d
{\displaystyle d}
.
=== Combinatorial orthogonal matching pursuit (COMP) ===
Combinatorial Orthogonal Matching Pursuit, or COMP, is a simple non-adaptive group-testing algorithm that forms the basis for the more complicated algorithms that follow in this section. First, each entry of the testing matrix is chosen i.i.d. to be
1
{\displaystyle 1}
with probability
1
/
d
{\displaystyle 1/d}
and
0
{\displaystyle 0}
otherwise. The decoding step proceeds column-wise (i.e. by item). If every test in which an item appears is positive, then the item is declared defective; otherwise the item is assumed to be non-defective. Or equivalently, if an item appears in any test whose outcome is negative, the item is declared non-defective; otherwise the item is assumed to be defective. An important property of this algorithm is that it never creates false negatives, though a false positive occurs when all locations with ones in the j-th column of
M
{\displaystyle M}
(corresponding to a non-defective item j) are "hidden" by the ones of other columns corresponding to defective items. The COMP algorithm requires no more than
e
d
(
1
+
δ
)
ln
(
n
)
{\displaystyle ed(1+\delta )\ln(n)}
tests to have an error probability less than or equal to
n
−
δ
{\displaystyle n^{-\delta }}
. This is within a constant factor of the lower bound for the average probability of error above. In the noisy case, one relaxes the requirement in the original COMP algorithm that the set of locations of ones in any column of
M
{\displaystyle M}
corresponding to a positive item be entirely contained in the set of locations of ones in the result vector. Instead, one allows for a certain number of “mismatches” – this number of mismatches depends on both the number of ones in each column, and also the noise parameter,
q
{\displaystyle q}
. This noisy COMP algorithm requires no more than
4.36
(
δ
+
1
+
δ
)
2
(
1
−
2
q
)
−
2
d
log
2
n
{\displaystyle 4.36({\sqrt {\delta }}+{\sqrt {1+\delta }})^{2}(1-2q)^{-2}d\log _{2}{n}}
tests to achieve an error probability at most
n
−
δ
{\displaystyle n^{-\delta }}
.
=== Definite defectives (DD) === The definite defectives method (DD) is an extension of the COMP algorithm that attempts to remove any false positives. Performance guarantees for DD have been shown to strictly exceed those of COMP. The decoding step uses a useful property of the COMP algorithm: that every item that COMP declares non-defective is certainly non-defective (that is, there are no false negatives). It proceeds as follows.
First the COMP algorithm is run, and any non-defectives that it detects are removed. All remaining items are now "possibly defective". Next the algorithm looks at all the positive tests. If an item appears as the only "possible defective" in a test, then it must be defective, so the algorithm declares it to be defective. All other items are assumed to be non-defective. The justification for this last step comes from the assumption that the number of defectives is much smaller than the total number of items. Note that steps 1 and 2 never make a mistake, so the algorithm can only make a mistake if it declares a defective item to be non-defective. Thus the DD algorithm can only create false negatives.