kb/Frequency_format_hypothesis-2.md at 338d16aba4e5bcb3b59f525cb5c54dcedcdd9542

turtle89431 48560fd30a Scrape wikipedia-science: 6474 new, 3240 updated, 9992 total (kb-cron)

2026-05-05 03:01:45 -07:00

5.0 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Frequency format hypothesis	3/3	https://en.wikipedia.org/wiki/Frequency_format_hypothesis	reference	science, encyclopedia	2026-05-05T09:59:33.511693+00:00	kb-cron

=== Nested-sets hypothesis === Frequency-format studies tend to share a confound -- namely that when presenting frequency information, the researchers also make clear the reference class they are referring to. For example, consider these three different ways to formulate the same problem: Probability Format "Consider a test to detect a disease that a given American has a 1/1000 chance of getting. An individual that does not have the disease has a 50/1000 chance of testing positive. An individual that does have the disease will definitely test positive. What is the chance that a person found to have a positive result actually has the disease, assuming that you nothing about the person’s symptoms or signs? _____%" Frequency Format "One out of every 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease. Imagine we have assembled a random sample of 1000 Americans. They were selected by lottery. Those who conducted the lottery had no information about the health status of any of these people. Given the information above, on average, how many people who test positive for the disease actually have the disease? out of." Probability Format Highlighting Set-Subset Structure of the Problem "The prevalence of disease X among Americans is 1/1000. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, the chance is 50/1000 that someone who is perfectly healthy would test positive for the disease. Imagine we have just given the test to a random sample of Americans. They were selected by lottery. Those who conducted the lottery had no information about the health status of any of these people. What is the chance that a person found to have a positive result actually has the disease? _____%" All three problems make clear the set of 1/1000 Americans who have the disease and that the test has perfect sensitivity (100% of people with the disease will receive a positive test) and that 50/1000 healthy people will receive a positive test (e.g., false positives). However, the latter two formats additionally highlights the separate classes within the population (e.g., positive test (with disease/without disease), negative test (without disease)), and therefore makes it easier for people to choose the correct class (people with a positive test) to reason with (thus generating something close to the correct answer—1/51/~2%.) Both frequency and Probability format highlighting set-subset structures lead to similar rates of correct answers, whereas the probability format alone leads to fewer correct answers (as people are likely to rely on the incorrect class in this case.) Research has also shown that one can reduce performance in the frequency format by disguising the set-subset relationships in the problem (just as in the standard probability format), thus demonstrating that it is not, in fact, the frequency format, but instead, the highlighting of the set-subset structure that improves judgments.

=== Ease of comparison === Critics of the frequency format hypothesis argue that probability formats allow for much easier comparison than frequency format representation of data. In some cases, using frequency formats actually does allow for easy comparison. If team A wins 19 of its 29 games, and another team B wins 10 of its 29 games, one can clearly see that team A is much better than team B. However comparison in frequency format is not always this clear and easy. If team A won 19 out of its 29 games, comparing this team with team B that won 6 out of its 11 games becomes much harder in frequency format. But, in the probability format, one could say since 65.6%(19/29) is greater than 54.5%, one could much easily compare the two.

=== Memory burden === Tooby and Cosmides had argued that frequency representation helps update data easier each time one gets new data. However this involves updating both numbers. Referring back to the example of teams, if team A won its 31st game, note that both the number of games won(20->21) and the number of games played(30->31) has to be updated. In the case of probability the only number to be updated is the single percentage number. Also, this number could be updated over the course of 10 games instead of updating each game, which cannot be done in the case of frequency format.

== References ==

5.0 KiB Raw Blame History Unescape Escape

5.0 KiB

Raw Blame History