---
title: "Group testing"
chunk: 5/10
source: "https://en.wikipedia.org/wiki/Group_testing"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:23.496143+00:00"
instance: "kb-cron"
---

Suppose a non-adaptive group testing procedure for 
  
    
      
        n
      
    
    {\displaystyle n}
  
 items consists of the tests 
  
    
      
        
          S
          
            1
          
        
        ,
        
          S
          
            2
          
        
        ,
        …
        ,
        
          S
          
            t
          
        
      
    
    {\displaystyle S_{1},S_{2},\dots ,S_{t}}
  
 for some 
  
    
      
        t
        ∈
        
          
            N
          
          
            ≥
            0
          
        
      
    
    {\displaystyle t\in \mathbb {N} _{\geq 0}}
  
. The testing matrix for this scheme is the 
  
    
      
        t
        ×
        n
      
    
    {\displaystyle t\times n}
  
 binary matrix, 
  
    
      
        M
      
    
    {\displaystyle M}
  
, where 
  
    
      
        (
        M
        
          )
          
            i
            j
          
        
        =
        1
      
    
    {\displaystyle (M)_{ij}=1}
  
 if and only if 
  
    
      
        j
        ∈
        
          S
          
            i
          
        
      
    
    {\displaystyle j\in S_{i}}
  
 (and is zero otherwise).
Thus each column of 
  
    
      
        M
      
    
    {\displaystyle M}
  
 represents an item and each row represents a test, with a 
  
    
      
        1
      
    
    {\displaystyle 1}
  
 in the 
  
    
      
        (
        i
        ,
        j
        )
        
          
            -th
          
        
      
    
    {\displaystyle (i,j){\textrm {-th}}}
  
 entry indicating that the 
  
    
      
        i
        
          
            -th
          
        
      
    
    {\displaystyle i{\textrm {-th}}}
  
 test included the 
  
    
      
        j
        
          
            -th
          
        
      
    
    {\displaystyle j{\textrm {-th}}}
  
 item and a 
  
    
      
        0
      
    
    {\displaystyle 0}
  
 indicating otherwise.
As well as the vector 
  
    
      
        
          x
        
      
    
    {\displaystyle \mathbf {x} }
  
 (of length 
  
    
      
        n
      
    
    {\displaystyle n}
  
) that describes the unknown defective set, it is common to introduce the result vector, which describes the results of each test.

Let 
  
    
      
        t
      
    
    {\displaystyle t}
  
 be the number of tests performed by a non-adaptive algorithm. The result vector, 
  
    
      
        
          y
        
        =
        (
        
          y
          
            1
          
        
        ,
        
          y
          
            2
          
        
        ,
        …
        ,
        
          y
          
            t
          
        
        )
      
    
    {\displaystyle \mathbf {y} =(y_{1},y_{2},\dots ,y_{t})}
  
, is a binary vector of length 
  
    
      
        t
      
    
    {\displaystyle t}
  
 (that is, 
  
    
      
        
          y
        
        ∈
        {
        0
        ,
        1
        
          }
          
            t
          
        
      
    
    {\displaystyle \mathbf {y} \in \{0,1\}^{t}}
  
) such that 
  
    
      
        
          y
          
            i
          
        
        =
        1
      
    
    {\displaystyle y_{i}=1}
  
 if and only if the result of the 
  
    
      
        i
        
          
            -th
          
        
      
    
    {\displaystyle i{\textrm {-th}}}
  
 test was positive (i.e. contained at least one defective).
With these definitions, the non-adaptive problem can be reframed as follows: first a testing matrix is chosen, 
  
    
      
        M
      
    
    {\displaystyle M}
  
, after which the vector 
  
    
      
        
          y
        
      
    
    {\displaystyle \mathbf {y} }
  
 is returned. Then the problem is to analyse 
  
    
      
        
          y
        
      
    
    {\displaystyle \mathbf {y} }
  
 to find some estimate for 
  
    
      
        
          x
        
      
    
    {\displaystyle \mathbf {x} }
  
.
In the simplest noisy case, where there is a constant probability, 
  
    
      
        q
      
    
    {\displaystyle q}
  
, that a group test will have an erroneous result, one considers a random binary vector, 
  
    
      
        
          v
        
      
    
    {\displaystyle \mathbf {v} }
  
, where each entry has a probability 
  
    
      
        q
      
    
    {\displaystyle q}
  
 of being 
  
    
      
        1
      
    
    {\displaystyle 1}
  
, and is 
  
    
      
        0
      
    
    {\displaystyle 0}
  
 otherwise. The vector that is returned is then 
  
    
      
        
          
            
              
                y
              
              ^
            
          
        
        =
        
          y
        
        +
        
          v
        
      
    
    {\displaystyle {\hat {\mathbf {y} }}=\mathbf {y} +\mathbf {v} }
  
, with the usual addition on 
  
    
      
        (
        
          Z
        
        
          /
        
        2
        
          Z
        
        
          )
          
            n
          
        
      
    
    {\displaystyle (\mathbb {Z} /2\mathbb {Z} )^{n}}
  
 (equivalently this is the element-wise XOR operation). A noisy algorithm must estimate 
  
    
      
        
          x
        
      
    
    {\displaystyle \mathbf {x} }
  
 using 
  
    
      
        
          
            
              
                y
              
              ^
            
          
        
      
    
    {\displaystyle {\hat {\mathbf {y} }}}
  
 (that is, without direct knowledge of 
  
    
      
        
          y
        
      
    
    {\displaystyle \mathbf {y} }
  
).

=== Bounds for non-adaptive algorithms ===
The matrix representation makes it possible to prove some bounds on non-adaptive group testing. The approach mirrors that of many deterministic designs, where 
  
    
      
        d
      
    
    {\displaystyle d}
  
-separable matrices are considered, as defined below.

A binary matrix, 
  
    
      
        M
      
    
    {\displaystyle M}
  
, is called 
  
    
      
        d
      
    
    {\displaystyle d}
  
-separable if every Boolean sum (logical OR) of any 
  
    
      
        d
      
    
    {\displaystyle d}
  
 of its columns is distinct. Additionally, the notation 
  
    
      
        
          
            
              d
              ¯
            
          
        
      
    
    {\displaystyle {\bar {d}}}
  
-separable indicates that every sum of any of up to 
  
    
      
        d
      
    
    {\displaystyle d}
  
 of 
  
    
      
        M
      
    
    {\displaystyle M}
  
's columns is distinct. (This is not the same as 
  
    
      
        M
      
    
    {\displaystyle M}
  
 being 
  
    
      
        k
      
    
    {\displaystyle k}
  
-separable for every 
  
    
      
        k
        ≤
        d
      
    
    {\displaystyle k\leq d}
  
.)
When 
  
    
      
        M
      
    
    {\displaystyle M}
  
 is a testing matrix, the property of being 
  
    
      
        d
      
    
    {\displaystyle d}
  
-separable (
  
    
      
        
          
            
              d
              ¯
            
          
        
      
    
    {\displaystyle {\bar {d}}}
  
-separable) is equivalent to being able to distinguish between (up to) 
  
    
      
        d
      
    
    {\displaystyle d}
  
 defectives. However, it does not guarantee that this will be straightforward. A stronger property, called disjunctness does.

A binary matrix, 
  
    
      
        M
      
    
    {\displaystyle M}
  
 is called 
  
    
      
        d
      
    
    {\displaystyle d}
  
-disjunct if the Boolean sum of any 
  
    
      
        d
      
    
    {\displaystyle d}
  
 columns does not contain any other column. (In this context, a column A is said to contain a column B if for every index where B has a 1, A also has a 1.)
A useful property of 
  
    
      
        d
      
    
    {\displaystyle d}
  
-disjunct testing matrices is that, with up to 
  
    
      
        d
      
    
    {\displaystyle d}
  
 defectives, every non-defective item will appear in at least one test whose outcome is negative. This means there is a simple procedure for finding the defectives: just remove every item that appears in a negative test.
Using the properties of 
  
    
      
        d
      
    
    {\displaystyle d}
  
-separable and 
  
    
      
        d
      
    
    {\displaystyle d}
  
-disjunct matrices the following can be shown for the problem of identifying 
  
    
      
        d
      
    
    {\displaystyle d}
  
 defectives among 
  
    
      
        n
      
    
    {\displaystyle n}
  
 total items.

The number of tests needed for an asymptotically small average probability of error scales as 
  
    
      
        O
        (
        d
        
          log
          
            2
          
        
        ⁡
        n
        )
      
    
    {\displaystyle O(d\log _{2}n)}
  
.
The number of tests needed for an asymptotically small maximum probability of error scales as 
  
    
      
        O
        (
        
          d
          
            2
          
        
        
          log
          
            2
          
        
        ⁡
        n
        )
      
    
    {\displaystyle O(d^{2}\log _{2}n)}
  
.
The number of tests needed for a zero probability of error scales as 
  
    
      
        O
        
          (
          
            
              
                
                  d
                  
                    2
                  
                
                
                  log
                  
                    2
                  
                
                ⁡
                n
              
              
                
                  log
                  
                    2
                  
                
                ⁡
                d
              
            
          
          )
        
      
    
    {\displaystyle O\left({\frac {d^{2}\log _{2}n}{\log _{2}d}}\right)}
  
.

== Generalised binary-splitting algorithm ==

The generalised binary-splitting algorithm is an essentially-optimal adaptive group-testing algorithm that finds 
  
    
      
        d
      
    
    {\displaystyle d}
  
 or fewer defectives among 
  
    
      
        n
      
    
    {\displaystyle n}
  
 items as follows: