kb/data/en.wikipedia.org/wiki/Design_effect-6.md

293 lines
8.4 KiB
Markdown

---
title: "Design effect"
chunk: 7/12
source: "https://en.wikipedia.org/wiki/Design_effect"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:49:56.844427+00:00"
instance: "kb-cron"
---
Frequency weights are a basic type of weighting presented in introductory statistics courses. With these, each weight is an integer number that indicates the absolute frequency of an item in the sample. These are also sometimes termed repeat (or occurrence) weights. The specific value has an absolute meaning that is lost if the weights are transformed, such as when scaling. For example: if we have the numbers 10 and 20 with the frequency weights values of 2 and 3, then when "spreading" our data it is: 10,10, 20, 20, 20 (with weights of 1 to each of these items). Frequency weights includes the amount of information contained in a dataset, and thus allows things like creating unbiased weighted variance estimation using Bessel's correction. Notice that such weights are often random variables, since the specific number of items we will see from each value in the dataset is random.
inverse-variance weighting, also known as analytic weights, is when each element is assigned a weight that is the inverse of its (known) variance. When all elements have the same expectancy, using such weights for calculating weighted averages has the least variance among all weighted averages. In the common formulation, these weights are known and not random.
Normalized (convex) weights is a set of weights that form a convex combination, i.e., each weight is a number between 0 and 1, and the sum of all weights is equal to 1. Any set of (non negative) weights can be turned into normalized weights by dividing each weight with the sum of all weights, making these weights normalized to sum to 1.
A related form are weights normalized to sum to sample size (n). These (non-negative) weights sum to the sample size (n), and their mean is 1. Any set of weights can be normalized to sample size by dividing each weight with the average of all weights. These weights have a nice relative interpretation where elements with weights larger than 1 are more "influential" (in terms of their relative influence on, say, the weighted mean) then the average observation, while weights smaller than 1 are less "influential" than the average observation.
Inverse probability weighting, or simply probability weights, is when each element is given a weight that is (proportional) to the inverse probability of selecting that element. E.g., by using
w
i
=
1
p
i
{\displaystyle w_{i}={\frac {1}{p_{i}}}}
. With inverse probability weights, we learn how many items each element "represents" in the target population. Hence, the sum of such weights returns the size of the target population of interest. Inverse probability weights can be normalized to sum to 1 or normalized to sum to the sample size (n), and many of the calculations from the following sections will yield the same results.
When a sample is EPSEM then all the probabilities are equal and the inverse of the selection probability yield weights that are all equal to one another (they are all equal to
N
n
=
1
f
{\displaystyle {\frac {N}{n}}={\frac {1}{f}}}
, where
n
{\displaystyle n}
is the sample size and
N
{\displaystyle N}
is the population size). Such a sample is called a self weighting sample.
There are also indirect ways of applying "weighted" adjustments. For example, the existing cases may be duplicated to impute missing observations (e.g. from non-response), with variance estimated using methods such as multiple imputation. An alternative approach is to remove (assign a weight of 0 to) some cases. For example, when wanting to reduce the influence of over-sampled groups that are less essential for some analysis. Both cases are similar in nature to inverse probability weighting but the application in practice gives more/less rows of data (making the input potentially simpler to use in some software implementation), instead of applying an extra column of weights. Nevertheless, the consequences of such implementations are similar to just using weights. So while in the case of removing observations the data can easily be handled by common software implementations, the case of adding rows requires special adjustments for the uncertainty estimations. Not doing so may lead to erroneous conclusions(i.e., there is no free lunch when using alternative representation of the underlying issues).
The term "Haphazard weights", coined by Kish, is used to refer to weights that correspond to unequal selection probabilities, but ones that are not related to the expectancy or variance of the selected elements.
==== Haphazard weights with estimated ratio-mean - Kish's design effect ====
===== Formula =====
When taking an unrestricted sample of
n
{\displaystyle n}
elements, we can then randomly split these elements into
H
{\displaystyle H}
disjoint strata, each of them containing some size of
n
h
{\displaystyle n_{h}}
elements so that
h
=
1
H
n
h
=
n
{\displaystyle \sum \limits _{h=1}^{H}n_{h}=n}
. All elements in each stratum
h
{\displaystyle h}
has some (known) non-negative weight assigned to them (
w
h
{\displaystyle w_{h}}
). The weight
w
h
{\displaystyle w_{h}}
can be produced by the inverse of some unequal selection probability for elements in each stratum
h
{\displaystyle h}
(i.e., inverse probability weighting following a procedure such as post-stratification). In this setting, Kish's design effect, for the increase in variance of the sample weighted mean due to this design (reflected in the weights), versus SRS of some outcome variable y (when there is no correlation between the weights and the outcome, i.e. haphazard weights) is:
Deff
=
n
h
=
1
H
(
n
h
w
h
2
)
(
h
=
1
H
n
h
w
h
)
2
{\displaystyle {\text{Deff}}={\frac {n\sum \limits _{h=1}^{H}(n_{h}w_{h}^{2})}{(\sum \limits _{h=1}^{H}n_{h}w_{h})^{2}}}}
By treating each item as coming from its own stratum
h
:
n
h
=
1
{\displaystyle \forall h:n_{h}=1}
, Kish (in 1992) simplified the above formula to the (well-known) following version: