kb/data/en.wikipedia.org/wiki/Group_testing-4.md

11 KiB
Raw Blame History

title chunk source category tags date_saved instance
Group testing 5/10 https://en.wikipedia.org/wiki/Group_testing reference science, encyclopedia 2026-05-05T09:50:23.496143+00:00 kb-cron

Suppose a non-adaptive group testing procedure for

    n
  

{\displaystyle n}

items consists of the tests

      S
      
        1
      
    
    ,
    
      S
      
        2
      
    
    ,
    …
    ,
    
      S
      
        t
      
    
  

{\displaystyle S_{1},S_{2},\dots ,S_{t}}

for some

    t
    ∈
    
      
        N
      
      
        ≥
        0
      
    
  

{\displaystyle t\in \mathbb {N} _{\geq 0}}

. The testing matrix for this scheme is the

    t
    ×
    n
  

{\displaystyle t\times n}

binary matrix,

    M
  

{\displaystyle M}

, where

    (
    M
    
      )
      
        i
        j
      
    
    =
    1
  

{\displaystyle (M)_{ij}=1}

if and only if

    j
    ∈
    
      S
      
        i
      
    
  

{\displaystyle j\in S_{i}}

(and is zero otherwise). Thus each column of

    M
  

{\displaystyle M}

represents an item and each row represents a test, with a

    1
  

{\displaystyle 1}

in the

    (
    i
    ,
    j
    )
    
      
        -th
      
    
  

{\displaystyle (i,j){\textrm {-th}}}

entry indicating that the

    i
    
      
        -th
      
    
  

{\displaystyle i{\textrm {-th}}}

test included the

    j
    
      
        -th
      
    
  

{\displaystyle j{\textrm {-th}}}

item and a

    0
  

{\displaystyle 0}

indicating otherwise. As well as the vector

      x
    
  

{\displaystyle \mathbf {x} }

(of length

    n
  

{\displaystyle n}

) that describes the unknown defective set, it is common to introduce the result vector, which describes the results of each test.

Let

    t
  

{\displaystyle t}

be the number of tests performed by a non-adaptive algorithm. The result vector,

      y
    
    =
    (
    
      y
      
        1
      
    
    ,
    
      y
      
        2
      
    
    ,
    …
    ,
    
      y
      
        t
      
    
    )
  

{\displaystyle \mathbf {y} =(y_{1},y_{2},\dots ,y_{t})}

, is a binary vector of length

    t
  

{\displaystyle t}

(that is,

      y
    
    ∈
    {
    0
    ,
    1
    
      }
      
        t
      
    
  

{\displaystyle \mathbf {y} \in \{0,1\}^{t}}

) such that

      y
      
        i
      
    
    =
    1
  

{\displaystyle y_{i}=1}

if and only if the result of the

    i
    
      
        -th
      
    
  

{\displaystyle i{\textrm {-th}}}

test was positive (i.e. contained at least one defective). With these definitions, the non-adaptive problem can be reframed as follows: first a testing matrix is chosen,

    M
  

{\displaystyle M}

, after which the vector

      y
    
  

{\displaystyle \mathbf {y} }

is returned. Then the problem is to analyse

      y
    
  

{\displaystyle \mathbf {y} }

to find some estimate for

      x
    
  

{\displaystyle \mathbf {x} }

. In the simplest noisy case, where there is a constant probability,

    q
  

{\displaystyle q}

, that a group test will have an erroneous result, one considers a random binary vector,

      v
    
  

{\displaystyle \mathbf {v} }

, where each entry has a probability

    q
  

{\displaystyle q}

of being

    1
  

{\displaystyle 1}

, and is

    0
  

{\displaystyle 0}

otherwise. The vector that is returned is then

            y
          
          ^
        
      
    
    =
    
      y
    
    +
    
      v
    
  

{\displaystyle {\hat {\mathbf {y} }}=\mathbf {y} +\mathbf {v} }

, with the usual addition on

    (
    
      Z
    
    
      /
    
    2
    
      Z
    
    
      )
      
        n
      
    
  

{\displaystyle (\mathbb {Z} /2\mathbb {Z} )^{n}}

(equivalently this is the element-wise XOR operation). A noisy algorithm must estimate

      x
    
  

{\displaystyle \mathbf {x} }

using

            y
          
          ^
        
      
    
  

{\displaystyle {\hat {\mathbf {y} }}}

(that is, without direct knowledge of

      y
    
  

{\displaystyle \mathbf {y} }

).

=== Bounds for non-adaptive algorithms === The matrix representation makes it possible to prove some bounds on non-adaptive group testing. The approach mirrors that of many deterministic designs, where

    d
  

{\displaystyle d}

-separable matrices are considered, as defined below.

A binary matrix,

    M
  

{\displaystyle M}

, is called

    d
  

{\displaystyle d}

-separable if every Boolean sum (logical OR) of any

    d
  

{\displaystyle d}

of its columns is distinct. Additionally, the notation

          d
          ¯
        
      
    
  

{\displaystyle {\bar {d}}}

-separable indicates that every sum of any of up to

    d
  

{\displaystyle d}

of

    M
  

{\displaystyle M}

's columns is distinct. (This is not the same as

    M
  

{\displaystyle M}

being

    k
  

{\displaystyle k}

-separable for every

    k
    ≤
    d
  

{\displaystyle k\leq d}

.) When

    M
  

{\displaystyle M}

is a testing matrix, the property of being

    d
  

{\displaystyle d}

-separable (

          d
          ¯
        
      
    
  

{\displaystyle {\bar {d}}}

-separable) is equivalent to being able to distinguish between (up to)

    d
  

{\displaystyle d}

defectives. However, it does not guarantee that this will be straightforward. A stronger property, called disjunctness does.

A binary matrix,

    M
  

{\displaystyle M}

is called

    d
  

{\displaystyle d}

-disjunct if the Boolean sum of any

    d
  

{\displaystyle d}

columns does not contain any other column. (In this context, a column A is said to contain a column B if for every index where B has a 1, A also has a 1.) A useful property of

    d
  

{\displaystyle d}

-disjunct testing matrices is that, with up to

    d
  

{\displaystyle d}

defectives, every non-defective item will appear in at least one test whose outcome is negative. This means there is a simple procedure for finding the defectives: just remove every item that appears in a negative test. Using the properties of

    d
  

{\displaystyle d}

-separable and

    d
  

{\displaystyle d}

-disjunct matrices the following can be shown for the problem of identifying

    d
  

{\displaystyle d}

defectives among

    n
  

{\displaystyle n}

total items.

The number of tests needed for an asymptotically small average probability of error scales as

    O
    (
    d
    
      log
      
        2
      
    
    
    n
    )
  

{\displaystyle O(d\log _{2}n)}

. The number of tests needed for an asymptotically small maximum probability of error scales as

    O
    (
    
      d
      
        2
      
    
    
      log
      
        2
      
    
    
    n
    )
  

{\displaystyle O(d^{2}\log _{2}n)}

. The number of tests needed for a zero probability of error scales as

    O
    
      (
      
        
          
            
              d
              
                2
              
            
            
              log
              
                2
              
            
            
            n
          
          
            
              log
              
                2
              
            
            
            d
          
        
      
      )
    
  

{\displaystyle O\left({\frac {d^{2}\log _{2}n}{\log _{2}d}}\right)}

.

== Generalised binary-splitting algorithm ==

The generalised binary-splitting algorithm is an essentially-optimal adaptive group-testing algorithm that finds

    d
  

{\displaystyle d}

or fewer defectives among

    n
  

{\displaystyle n}

items as follows: