kb/data/en.wikipedia.org/wiki/Group_testing-2.md

9.0 KiB
Raw Blame History

title chunk source category tags date_saved instance
Group testing 3/10 https://en.wikipedia.org/wiki/Group_testing reference science, encyclopedia 2026-05-05T09:50:23.496143+00:00 kb-cron

=== Combinatorial group testing === Group testing was first studied in the combinatorial context by Li in 1962, with the introduction of Lis

    s
  

{\displaystyle s}

-stage algorithm. Li proposed an extension of Dorfman's '2-stage algorithm' to an arbitrary number of stages that required no more than

    t
    =
    
      
        e
        
          
            log
            
              2
            
          
          
          (
          e
          )
        
      
    
    d
    
      log
      
        2
      
    
    
    (
    n
    )
  

{\textstyle t={\frac {e}{\log _{2}(e)}}d\log _{2}(n)}

tests to be guaranteed to find

    d
  

{\displaystyle d}

or fewer defectives among

    n
  

{\displaystyle n}

items. The idea was to remove all the items in negative tests, and divide the remaining items into groups as was done with the initial pool. This was to be done

    s
    
    1
  

{\displaystyle s-1}

times before performing individual testing. Combinatorial group testing in general was later studied more fully by Katona in 1973. Katona introduced the matrix representation of non-adaptive group-testing and produced a procedure for finding the defective in the non-adaptive 1-defective case in no more than

    t
    =
    ⌈
    
      log
      
        2
      
    
    
    (
    n
    )
    ⌉
  

{\displaystyle t=\lceil \log _{2}(n)\rceil }

tests, which he also proved to be optimal. In general, finding optimal algorithms for adaptive combinatorial group testing is difficult, and although the computational complexity of group testing has not been determined, it is suspected to be hard in some complexity class. However, an important breakthrough occurred in 1972, with the introduction of the generalised binary-splitting algorithm. The generalised binary-splitting algorithm works by performing a binary search on groups that test positive, and is a simple algorithm that finds a single defective in no more than the information-lower-bound number of tests. In scenarios where there are two or more defectives, the generalised binary-splitting algorithm still produces near-optimal results, requiring at most

    d
    
    1
  

{\displaystyle d-1}

tests above the information lower bound where

    d
  

{\displaystyle d}

is the number of defectives. Considerable improvements to this were made in 2013 by Allemann, getting the required number of tests to less than

    0.187
    d
    +
    0.5
    
      log
      
        2
      
    
    
    (
    d
    )
    +
    5.5
  

{\displaystyle 0.187d+0.5\log _{2}(d)+5.5}

above the information lower bound when

    n
    
      /
    
    d
    ≥
    38
  

{\displaystyle n/d\geq 38}

and

    d
    ≥
    10
  

{\displaystyle d\geq 10}

. This was achieved by changing the binary search in the binary-splitting algorithm to a complex set of sub-algorithms with overlapping test groups. As such, the problem of adaptive combinatorial group testing with a known number or upper bound on the number of defectives has essentially been solved, with little room for further improvement. There is an open question as to when individual testing is minmax. Hu, Hwang and Wang showed in 1981 that individual testing is minmax when

    n
    ≤
    ⌊
    (
    5
    d
    +
    1
    )
    
      /
    
    2
    ⌋
  

{\displaystyle n\leq \lfloor (5d+1)/2\rfloor }

, and that it is not minmax when

    n
    >
    3
    d
  

{\displaystyle n>3d}

. It is currently conjectured that this bound is sharp: that is, individual testing is minmax if and only if

    n
    ≤
    3
    d
  

{\displaystyle n\leq 3d}

. Some progress was made in 2000 by Riccio and Colbourn, who showed that for large

    n
  

{\displaystyle n}

, individual testing is minmax when

    d
    ≥
    n
    
      /
    
    
      log
      
        3
        
          /
        
        2
      
    
    
    (
    3
    )
    ≈
    0.369
    n
  

{\displaystyle d\geq n/\log _{3/2}(3)\approx 0.369n}

.

=== Non-adaptive and probabilistic testing === One of the key insights in non-adaptive group testing is that significant gains can be made by eliminating the requirement that the group-testing procedure be certain to succeed (the "combinatorial" problem), but rather permit it to have some low but non-zero probability of mis-labelling each item (the "probabilistic" problem). It is known that as the number of defective items approaches the total number of items, exact combinatorial solutions require significantly more tests than probabilistic solutions — even probabilistic solutions permitting only an asymptotically small probability of error. In this vein, Chan et al. (2011) introduced COMP, a probabilistic algorithm that requires no more than

    t
    =
    e
    d
    (
    1
    +
    δ
    )
    ln
    
    (
    n
    )
  

{\displaystyle t=ed(1+\delta )\ln(n)}

tests to find up to

    d
  

{\displaystyle d}

defectives in

    n
  

{\displaystyle n}

items with a probability of error no more than

      n
      
        
        δ
      
    
  

{\displaystyle n^{-\delta }}

. This is within a constant factor of the

    t
    =
    O
    (
    d
    
      log
      
        2
      
    
    
    n
    )
  

{\displaystyle t=O(d\log _{2}n)}

lower bound. Chan et al. (2011) also provided a generalisation of COMP to a simple noisy model, and similarly produced an explicit performance bound, which was again only a constant (dependent on the likelihood of a failed test) above the corresponding lower bound. In general, the number of tests required in the Bernoulli noise case is a constant factor larger than in the noiseless case. Aldridge, Baldassini and Johnson (2014) produced an extension of the COMP algorithm that added additional post-processing steps. They showed that the performance of this new algorithm, called DD, strictly exceeds that of COMP, and that DD is 'essentially optimal' in scenarios where

      d
      
        2
      
    
    ≥
    n
  

{\displaystyle d^{2}\geq n}

, by comparing it to a hypothetical algorithm that defines a reasonable optimum. The performance of this hypothetical algorithm suggests that there is room for improvement when

      d
      
        2
      
    
    <
    n
  

{\displaystyle d^{2}<n}

, as well as suggesting how much improvement this might be.

== Formalisation of combinatorial group testing == This section formally defines the notions and terms relating to group testing.

The input vector,

      x
    
    =
    (
    
      x
      
        1
      
    
    ,
    
      x
      
        2
      
    
    ,
    …
    ,
    
      x
      
        n
      
    
    )
  

{\displaystyle \mathbf {x} =(x_{1},x_{2},\dots ,x_{n})}

, is defined to be a binary vector of length

    n
  

{\displaystyle n}

(that is,

      x
    
    ∈
    {
    0
    ,
    1
    
      }
      
        n
      
    
  

{\displaystyle \mathbf {x} \in \{0,1\}^{n}}

), with the j-th item being called defective if and only if

      x
      
        j
      
    
    =
    1
  

{\displaystyle x_{j}=1}

. Further, any non-defective item is called a 'good' item.