9.1 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Group testing | 9/10 | https://en.wikipedia.org/wiki/Group_testing | reference | science, encyclopedia | 2026-05-05T09:50:23.496143+00:00 | kb-cron |
=== Machine learning and compressed sensing === Machine learning is a field of computer science that has many software applications such as DNA classification, fraud detection and targeted advertising. One of the main subfields of machine learning is the 'learning by examples' problem, where the task is to approximate some unknown function when given its value at a number of specific points. As outlined in this section, this function learning problem can be tackled with a group-testing approach. In a simple version of the problem, there is some unknown function,
f
:
{
0
,
1
}
N
→
{
0
,
1
}
{\displaystyle f:\{0,1\}^{N}\to \{0,1\}}
where
f
(
x
)
=
a
⋅
x
{\displaystyle f({\textbf {x}})={\textbf {a}}\cdot {\textbf {x}}}
, and
a
∈
{
0
,
1
}
N
{\displaystyle {\textbf {a}}\in \{0,1\}^{N}}
(using logical arithmetic: addition is logical OR and multiplication is logical AND). Here
a
{\displaystyle {\textbf {a}}}
is '
d
{\displaystyle d}
sparse', which means that at most
d
≪
N
{\displaystyle d\ll N}
of its entries are
1
{\displaystyle 1}
. The aim is to construct an approximation to
f
{\displaystyle f}
using
t
{\displaystyle t}
point evaluations, where
t
{\displaystyle t}
is as small as possible. (Exactly recovering
f
{\displaystyle f}
corresponds to zero-error algorithms, whereas
f
{\displaystyle f}
is approximated by algorithms that have a non-zero probability of error.) In this problem, recovering
f
{\displaystyle f}
is equivalent to finding
a
{\displaystyle {\textbf {a}}}
. Moreover,
f
(
p
)
=
1
{\displaystyle f({\textbf {p}})=1}
if and only if there is some index,
n
{\displaystyle n}
, where
a
n
=
p
n
=
1
{\displaystyle {\textbf {a}}_{n}={\textbf {p}}_{n}=1}
. Thus this problem is analogous to a group-testing problem with
d
{\displaystyle d}
defectives and
n
{\displaystyle n}
total items. The entries of
a
{\displaystyle {\textbf {a}}}
are the items, which are defective if they are
1
{\displaystyle 1}
,
p
{\displaystyle {\textbf {p}}}
specifies a test, and a test is positive if and only if
f
(
p
)
=
1
{\displaystyle f({\textbf {p}})=1}
. In reality, one will often be interested in functions that are more complicated, such as
f
:
C
N
→
C
{\displaystyle f:\mathbb {C} ^{N}\to \mathbb {C} }
, again where
f
(
x
)
=
a
⋅
x
{\displaystyle f({\textbf {x}})={\textbf {a}}\cdot {\textbf {x}}}
. Compressed sensing, which is closely related to group testing, can be used to solve this problem. In compressed sensing, the goal is to reconstruct a signal,
v
∈
C
N
{\displaystyle {\textbf {v}}\in \mathbb {C} ^{N}}
, by taking a number of measurements. These measurements are modelled as taking the dot product of
v
{\displaystyle {\textbf {v}}}
with a chosen vector. The aim is to use a small number of measurements, though this is typically not possible unless something is assumed about the signal. One such assumption (which is common) is that only a small number of entries of
v
{\displaystyle {\textbf {v}}}
are significant, meaning that they have a large magnitude. Since the measurements are dot products of
v
{\displaystyle {\textbf {v}}}
, the equation
M
v
=
q
{\displaystyle M{\textbf {v}}={\textbf {q}}}
holds, where
M
{\displaystyle M}
is a
t
×
N
{\displaystyle t\times N}
matrix that describes the set of measurements that have been chosen and
q
{\displaystyle \mathbf {q} }
is the set of measurement results. This construction shows that compressed sensing is a kind of 'continuous' group testing. The primary difficulty in compressed sensing is identifying which entries are significant. Once that is done, there are a variety of methods to estimate the actual values of the entries. This task of identification can be approached with a simple application of group testing. Here a group test produces a complex number: the sum of the entries that are tested. The outcome of a test is called positive if it produces a complex number with a large magnitude, which, given the assumption that the significant entries are sparse, indicates that at least one significant entry is contained in the test. There are explicit deterministic constructions for this type of combinatorial search algorithm, requiring
d
2
(
log
2
log
2
N
)
O
(
1
)
{\displaystyle d2^{(\log _{2}\log _{2}N)^{O(1)}}}
measurements. However, as with group-testing, these are sub-optimal, and random constructions (such as COMP) can often recover
f
{\displaystyle f}
sub-linearly in
N
{\displaystyle N}
.
=== Multiplex assay design for COVID19 testing === During a pandemic such as the COVID-19 outbreak in 2020, virus detection assays are sometimes run using nonadaptive group testing designs. One example was provided by the Origami Assays project which released open source group testing designs to run on a laboratory standard 96 well plate.
In a laboratory setting, one challenge of group testing is the construction of the mixtures can be time-consuming and difficult to do accurately by hand. Origami assays provided a workaround for this construction problem by providing paper templates to guide the technician on how to allocate patient samples across the test wells. Using the largest group testing designs (XL3) it was possible to test 1120 patient samples in 94 assay wells. If the true positive rate was low enough, then no additional testing was required.