---
title: "Design effect"
chunk: 4/12
source: "https://en.wikipedia.org/wiki/Design_effect"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:49:56.844427+00:00"
instance: "kb-cron"
---

Disproportional sampling due to selection frame or procedure. This happens when a researcher deliberately over- or under-samples specific sub-populations or clusters. For example:
In stratified sampling when units from some strata are known to have a larger variance than other strata. In such cases, the intention of the researcher may be to use this prior knowledge about the variance between strata in order to reduce the overall variance of an estimator of some population level parameter of interest (e.g., the mean). This can be achieved by a strategy known as optimum allocation, in which a stratum 
  
    
      
        h
      
    
    {\displaystyle h}
  
 is over sampled proportional to higher standard deviation and lower sampling cost (i.e., 
  
    
      
        
          f
          
            h
          
        
        ∝
        
          
            
              S
              
                h
              
            
            
              
                C
                
                  h
                
              
            
          
        
      
    
    {\displaystyle f_{h}\propto {\frac {S_{h}}{\sqrt {C_{h}}}}}
  
, where 
  
    
      
        
          S
          
            h
          
        
      
    
    {\displaystyle S_{h}}
  
 is the standard deviation of the outcome in 
  
    
      
        h
      
    
    {\displaystyle h}
  
, and 
  
    
      
        
          C
          
            h
          
        
      
    
    {\displaystyle C_{h}}
  
 relates to the cost of recruiting one element from 
  
    
      
        h
      
    
    {\displaystyle h}
  
). An example of an optimum allocation is Neyman's optimal allocation which, when cost is fixed for recruiting people from each stratum, the sample size is: 
  
    
      
        
          n
          
            h
          
        
        =
        n
        
          
            
              
                W
                
                  h
                
              
              
                S
                
                  U
                  h
                
              
            
            
              
                ∑
                
                  h
                
              
              
                W
                
                  h
                
              
              
                S
                
                  U
                  h
                
              
            
          
        
      
    
    {\displaystyle n_{h}=n{\frac {W_{h}S_{Uh}}{\sum _{h}W_{h}S_{Uh}}}}
  
. Where the summation is over all strata: n is the total sample size; 
  
    
      
        
          n
          
            h
          
        
      
    
    {\displaystyle n_{h}}
  
 is the sample size for stratum h; 
  
    
      
        
          W
          
            h
          
        
        =
        
          
            
              N
              
                h
              
            
            N
          
        
      
    
    {\displaystyle W_{h}={\frac {N_{h}}{N}}}
  
 is the relative size of stratum h as compared to the entire population N; and 
  
    
      
        
          S
          
            U
            h
          
        
      
    
    {\displaystyle S_{Uh}}
  
 is the standard error in stratum h. A related concept to optimum design is optimal experimental design. If there is interest in comparing two strata (e.g., people from two specific socio-demographic groups, or from two regions, etc.), in which case the smaller group may be over-sampled. This way, the variance of the estimator that compares the two groups is reduced. In cluster sampling there may be clusters of different sizes but the procedure samples from all clusters using SRS, and all elements in the cluster are measured (for example, if the cluster sizes are not known upfront at the stage of sampling). In some two-stage cluster sampling based cluster sizes. For example, when in the first stage the clusters are sampled proportionally to the estimation of their size (a.k.a.: PPS Probability Proportional to Size) and at the second stage a fixed proportion of elements are chosen (e.g., half, or all the elements in the cluster) - then the selection probabilities are different for elements from different clusters. A similar case is when the first stage attempts to sample the clusters using PPS, the second stage uses a fixed number of elements in each cluster - but the cluster sizes used for the first stage sampling were inaccurate (so that some smaller cluster may have a higher-than-it-should chance of being selected. And vice versa for larger clusters with too-small a chance of being sampled). In such cases, the larger the errors in the sampling probabilities used in the first stage, the larger the unequal selection probabilities for each element will be. When the frame used for sampling includes duplication of some of the items, thus leading some items to have a larger probability than others to be sampled (e.g., if the sampling frame was created by merging several lists. Or if recruiting users from several ad channels in which some of the users are available for recruitment from several of the channels, while others are available to be recruited from only one of the channels) so that different units would have different sampling probabilities, thus making this sampling procedure to not be EPSEM. When several different samples/frames are to be combined. For example, if running different ad campaigns for recruiting respondents. Or when combining results from several studies done by different researchers and/or at different times (i.e., Meta-analysis). When disproportional sampling happens, due to sampling design decisions, the researcher may (sometimes) be able to trace back the decision and accurately calculate the exact inclusion probability. When these selection probabilities are hard to trace back, they may be estimated using some propensity score model combined with information from auxiliary variables (e.g., age, gender, etc.). Non-coverage. This happens, for example, if people are sampled based on some pre-defined list that doesn't include all the people in the population (e.g., a phone book or using ads to recruit people to a survey). These missing units are missing due to some failure of creating the sampling frame, as opposed to deliberate exclusion of some people (e.g. minors, people who cannot vote, etc.). The effect of non-coverage on sampling probability is considered difficult to measure (and adjust for) in various survey situations, unless strong assumptions are made. Adjustments for non-coverage can lead to inadequate weights when the relevant covariates are not used for adjustment. If there are covariates that can be used to correct for non-coverage, they are expected to lead to unequal survey weights. Non-response. This refers to the failure of obtaining measurements on sampled units that are intended to be measured. Reasons for non-response are varied and depend on the context. A person may be temporarily unavailable, for example if they are not available to answer the phone when a telephone survey is done. A person may also refuse to answer the survey due to a variety of reasons, e.g. different tendencies of people from different ethnic/demographic/socio-economic groups to respond in general; insufficient incentive to spend the time or share data; the identity of the institution that is running the survey; inability to respond (e.g. due to illness, illiteracy, or a language barrier); respondent is not found (e.g. they moved); the response was lost/destroyed during encoding or transmission (i.e., measurement error). In the context of surveys, these reasons may be related to answering the entire survey or just specific questions. Statistical adjustments. These may include methods such as post-stratification, raking, or propensity score (estimation) models - used to perform an adjustment of the sample to some known (or estimated) strata sizes. These adjustments can be in addition of design weights, which aims to account for imbalances due to some known sampling design. Such procedures are used to mitigate issues in the sampling ranging from sampling error, under-coverage of the sampling frame to non-response. For example, these methods can be used to make the sample more similar to some target "controls" (i.e., population of interest), a process also called "standardization". In such cases, these adjustments help with providing unbiased estimators (often with the cost of increased variance, as seen in the following sections). If the original sample is a nonprobability sample, then post-stratification adjustments are just similar to quota sampling. Note that if a simple random sample is used, a post-stratification (using some auxiliary information) does not offer an estimator that is uniformly better than just an unweighted estimator. However, it can be viewed as a more "robust" estimator.