819 lines
12 KiB
Markdown
819 lines
12 KiB
Markdown
---
|
||
title: "Design effect"
|
||
chunk: 9/12
|
||
source: "https://en.wikipedia.org/wiki/Design_effect"
|
||
category: "reference"
|
||
tags: "science, encyclopedia"
|
||
date_saved: "2026-05-05T09:49:56.844427+00:00"
|
||
instance: "kb-cron"
|
||
---
|
||
|
||
===== Relation to disproportionate stratified sampling =====
|
||
Kish's original definition compared the variance under some sampling design to the variance achieved through a simple random sample. Some literature provide the following alternative definition for Kish's design effect: "the ratio of the variance of the weighted survey mean under disproportionate stratified sampling to the variance under proportionate stratified sampling when all stratum unit variances are equal". Reflecting on this, Park and Lee (2006) stated that "The rationale behind [...][Kish's] derivation is that the loss in precision of [the weighted mean] due to haphazard unequal weighting can be approximated by the ratio of the variance under disproportionate stratified sampling to that under the proportionate stratified sampling".
|
||
Note that this alternative definition only approximated since if the denominator is based on "proportionate stratified sampling" (achieved via stratified sampling) then such a selection will yield a reduced variance as compared with simple random sample. This is since stratified sampling removes some of the variability in the specific number of elements per stratum, as occurs under SRS.
|
||
Relatedly, Cochran (1977) provides a formula for the proportional increase in variance due to deviation from optimum allocation (what, in Kish's formulas, would be called L).
|
||
|
||
===== Alternative naming conventions =====
|
||
Early papers used the term
|
||
|
||
|
||
|
||
|
||
Deff
|
||
|
||
|
||
|
||
{\displaystyle {\text{Deff}}}
|
||
|
||
. As more definitions of the design effect appeared, Kish's design effect for unequal selection probabilities was denoted
|
||
|
||
|
||
|
||
|
||
|
||
Deff
|
||
|
||
|
||
Kish
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\text{Deff}}_{\text{Kish}}}
|
||
|
||
(or
|
||
|
||
|
||
|
||
|
||
|
||
Deft
|
||
|
||
|
||
Kish
|
||
|
||
|
||
2
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\text{Deft}}_{\text{Kish}}^{2}}
|
||
|
||
) or simply
|
||
|
||
|
||
|
||
|
||
|
||
Deff
|
||
|
||
|
||
K
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\text{Deff}}_{K}}
|
||
|
||
for short. Kish's design effect is also known as the "Unequal Weighting Effect" (or just UWE), termed by Liu et al. in 2002.
|
||
|
||
==== When the outcome correlates with the selection probabilities ====
|
||
|
||
===== Spencer's Deff for estimated total =====
|
||
The estimator for the total is the "p-expanded with replacement" estimator (a.k.a.: pwr-estimator or Hansen and Hurwitz). It is based on a simple random sample (with replacement, denoted SIR) of n items (
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
{\displaystyle y_{k}}
|
||
|
||
) from a population of size N. Each item has a probability of
|
||
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
{\displaystyle p_{k}}
|
||
|
||
(k from 1 to N) to be drawn in a single draw (
|
||
|
||
|
||
|
||
|
||
∑
|
||
|
||
U
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
=
|
||
1
|
||
|
||
|
||
{\displaystyle \sum _{U}p_{k}=1}
|
||
|
||
, i.e. it is a multinomial distribution). The probability that a specific
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
{\displaystyle y_{k}}
|
||
|
||
will appear in the sample is
|
||
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
{\displaystyle p_{k}}
|
||
|
||
. The "p-expanded with replacement" value is
|
||
|
||
|
||
|
||
|
||
Z
|
||
|
||
i
|
||
|
||
|
||
=
|
||
|
||
|
||
|
||
y
|
||
|
||
k
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle Z_{i}={\frac {y_{k}}{p_{k}}}}
|
||
|
||
with the following expectancy:
|
||
|
||
|
||
|
||
E
|
||
[
|
||
|
||
Z
|
||
|
||
i
|
||
|
||
|
||
]
|
||
=
|
||
E
|
||
[
|
||
|
||
I
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
k
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
]
|
||
=
|
||
|
||
|
||
|
||
y
|
||
|
||
k
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
E
|
||
[
|
||
|
||
I
|
||
|
||
i
|
||
|
||
|
||
]
|
||
=
|
||
|
||
|
||
|
||
y
|
||
|
||
k
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
|
||
p
|
||
|
||
k
|
||
|
||
|
||
=
|
||
|
||
y
|
||
|
||
k
|
||
|
||
|
||
|
||
|
||
{\displaystyle E[Z_{i}]=E[I_{i}{\frac {y_{k}}{p_{k}}}]={\frac {y_{k}}{p_{k}}}E[I_{i}]={\frac {y_{k}}{p_{k}}}p_{k}=y_{k}}
|
||
|
||
. Hence
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Y
|
||
^
|
||
|
||
|
||
|
||
|
||
p
|
||
w
|
||
r
|
||
|
||
|
||
=
|
||
|
||
|
||
1
|
||
n
|
||
|
||
|
||
|
||
∑
|
||
|
||
i
|
||
|
||
|
||
n
|
||
|
||
|
||
|
||
Z
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\hat {Y}}_{pwr}={\frac {1}{n}}\sum _{i}^{n}Z_{i}}
|
||
|
||
, the pwr-estimator, is an unbiased estimator for the sum total of y.
|
||
In 2000, Bruce D. Spencer proposed a formula for estimating the design effect for the variance of estimating the total (not the mean) of some quantity (
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Y
|
||
^
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\hat {Y}}}
|
||
|
||
), when there is correlation between the selection probabilities of the elements and the outcome variable of interest.
|
||
In this setup, a sample of size n is drawn (with replacement) from a population of size N. Each item is drawn with probability
|
||
|
||
|
||
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle P_{i}}
|
||
|
||
(where
|
||
|
||
|
||
|
||
|
||
∑
|
||
|
||
i
|
||
=
|
||
1
|
||
|
||
|
||
N
|
||
|
||
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
=
|
||
1
|
||
|
||
|
||
{\displaystyle \sum _{i=1}^{N}P_{i}=1}
|
||
|
||
, i.e. multinomial distribution). The selection probabilities are used to define the Normalized (convex) weights:
|
||
|
||
|
||
|
||
|
||
w
|
||
|
||
i
|
||
|
||
|
||
=
|
||
|
||
|
||
1
|
||
|
||
n
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle w_{i}={\frac {1}{nP_{i}}}}
|
||
|
||
. Notice that for some random set of n items, the sum of weights will be equal to 1 only by expectation (
|
||
|
||
|
||
|
||
E
|
||
[
|
||
|
||
w
|
||
|
||
i
|
||
|
||
|
||
]
|
||
=
|
||
1
|
||
|
||
|
||
{\displaystyle E[w_{i}]=1}
|
||
|
||
) with some variability of the sum around it (i.e., the sum of elements from a Poisson binomial distribution). The relationship between
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle y_{i}}
|
||
|
||
and
|
||
|
||
|
||
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle P_{i}}
|
||
|
||
is defined by the following (population) simple linear regression:
|
||
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
i
|
||
|
||
|
||
=
|
||
α
|
||
+
|
||
β
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
+
|
||
|
||
ϵ
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle y_{i}=\alpha +\beta P_{i}+\epsilon _{i}}
|
||
|
||
|
||
Where
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle y_{i}}
|
||
|
||
is the outcome of element i, which linearly depends on
|
||
|
||
|
||
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle P_{i}}
|
||
|
||
with the intercept
|
||
|
||
|
||
|
||
α
|
||
|
||
|
||
{\displaystyle \alpha }
|
||
|
||
and slope
|
||
|
||
|
||
|
||
β
|
||
|
||
|
||
{\displaystyle \beta }
|
||
|
||
. The residual from the fitted line is
|
||
|
||
|
||
|
||
|
||
ϵ
|
||
|
||
i
|
||
|
||
|
||
=
|
||
|
||
y
|
||
|
||
i
|
||
|
||
|
||
−
|
||
(
|
||
α
|
||
+
|
||
β
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
)
|
||
|
||
|
||
{\displaystyle \epsilon _{i}=y_{i}-(\alpha +\beta P_{i})}
|
||
|
||
. We can also define the population variances of the outcome and the residuals as
|
||
|
||
|
||
|
||
|
||
σ
|
||
|
||
y
|
||
|
||
|
||
2
|
||
|
||
|
||
|
||
|
||
{\displaystyle \sigma _{y}^{2}}
|
||
|
||
and
|
||
|
||
|
||
|
||
|
||
σ
|
||
|
||
ϵ
|
||
|
||
|
||
2
|
||
|
||
|
||
|
||
|
||
{\displaystyle \sigma _{\epsilon }^{2}}
|
||
|
||
. The correlation between
|
||
|
||
|
||
|
||
|
||
P
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle P_{i}}
|
||
|
||
and
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle y_{i}}
|
||
|
||
is
|
||
|
||
|
||
|
||
|
||
ρ
|
||
|
||
y
|
||
,
|
||
P
|
||
|
||
|
||
|
||
|
||
{\displaystyle \rho _{y,P}}
|
||
|
||
.
|
||
Spencer's (approximate) design effect for estimating the total of y is:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Deff
|
||
|
||
|
||
S
|
||
p
|
||
e
|
||
n
|
||
c
|
||
e
|
||
r
|
||
|
||
|
||
=
|
||
(
|
||
1
|
||
−
|
||
|
||
|
||
|
||
|
||
ρ
|
||
^
|
||
|
||
|
||
|
||
|
||
y
|
||
,
|
||
P
|
||
|
||
|
||
2
|
||
|
||
|
||
)
|
||
(
|
||
1
|
||
+
|
||
L
|
||
)
|
||
+
|
||
|
||
|
||
(
|
||
|
||
|
||
|
||
|
||
α
|
||
^
|
||
|
||
|
||
|
||
|
||
|
||
|
||
σ
|
||
^
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
|
||
|
||
|
||
)
|
||
|
||
|
||
2
|
||
|
||
|
||
L
|
||
|
||
|
||
{\displaystyle {\text{Deff}}_{Spencer}=(1-{\hat {\rho }}_{y,P}^{2})(1+L)+\left({\frac {\hat {\alpha }}{{\hat {\sigma }}_{y}}}\right)^{2}L}
|
||
|
||
|
||
Where:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
ρ
|
||
^
|
||
|
||
|
||
|
||
|
||
y
|
||
,
|
||
P
|
||
|
||
|
||
2
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\hat {\rho }}_{y,P}^{2}}
|
||
|
||
estimates
|
||
|
||
|
||
|
||
|
||
ρ
|
||
|
||
y
|
||
,
|
||
P
|
||
|
||
|
||
2
|
||
|
||
|
||
|
||
|
||
{\displaystyle \rho _{y,P}^{2}}
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
α
|
||
^
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\hat {\alpha }}}
|
||
|
||
estimates the slope
|
||
|
||
|
||
|
||
α
|
||
|
||
|
||
{\displaystyle \alpha }
|
||
|