kb/data/en.wikipedia.org/wiki/Design_effect-5.md

164 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Design effect"
chunk: 6/12
source: "https://en.wikipedia.org/wiki/Design_effect"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:49:56.844427+00:00"
instance: "kb-cron"
---
==== "Design based" vs "model based" for describing properties of estimators ====
Adjusting for unequal probability selection through "individual case weights" (e.g. inverse probability weighting), yields various types of estimators for quantities of interest. Estimators such as HorvitzThompson estimator yield unbiased estimators (if the selection probabilities are indeed known, or approximately known), for total and the mean of the population. Deville and Särndal (1992) coined the term "calibration estimator" for estimators using weights such that they satisfy some condition, such as having the sum of weights equal the population size. And more generally, that the weighted sum of weights is equal some quantity of an auxiliary variable:
w
i
x
i
=
X
{\displaystyle \sum w_{i}x_{i}=X}
(e.g., that the sum of weighted ages of the respondents is equal to the population size in each age group).
The two primary ways to argue about the properties of calibration estimators are:
randomization based (or, sampling design based) - in this case, the weights (
w
i
{\displaystyle w_{i}}
) and values of the outcome of interest
y
i
{\displaystyle y_{i}}
that are measured in the sample are all treated as known. In this framework, there is variability in the (known) values of the outcome (Y). However, the only randomness comes from which of the elements in the population were picked into the sample (often denoted as
I
i
{\displaystyle I_{i}}
, getting 1 if element
i
{\displaystyle i}
is in the sample and 0 if it is not). For a simple random sample, each
I
i
{\displaystyle I_{i}}
will be an IID Bernoulli distribution with some parameter
p
{\displaystyle p}
. For general EPSEM (equal probability sampling)
I
i
{\displaystyle I_{i}}
will still be Bernoulli with some parameter
p
{\displaystyle p}
, but they may no longer be independent random variables. I.e., knowing that a sample is EPSEM means that it maintains marginally equal probability of selection, but it does not inform us about the joint probability of selection. For something like post stratification, the number of elements at each stratum can be modeled as a multinomial distribution with different
p
h
{\displaystyle p_{h}}
inclusion probabilities for each element belonging to some stratum
h
{\displaystyle h}
. In these cases, the sample size itself can be a random variable.
model based - in this case, the sample is fixed, the weights are fixed, but the outcome of interest is treated as a random variable. For example, in the case of post-stratification, the outcome can be modeled as some linear regression function where the independent variables are indicator variables mapping each observation to its relevant stratum, and the variability comes with the error term.
As we will see later, some proofs in the literature rely on the randomization-based framework, while others focus on the model-based perspective. When moving from the mean to the weighted mean, more complexity is added. For example, in the context of survey methodology, often the population size itself is considered an unknown quantity that is estimated. So in the calculation of the weighted mean is in fact based on a ratio estimator, with an estimator of the total at the numerator and an estimator of the population size in the denominator (making the variance calculation to be more complex).
==== Common types of weights ====
There are many types (and subtypes) of weights, with different ways to use and interpret them. With some weights their absolute value has some important meaning, while with other weights the important part is the relative values of the weights to each other. This section introduces some of the more common types of weights so that they can be referenced in follow-up sections.