kb/Category_utility-4.md at bfa66fc0f8d5fd91d2e04bab963dcf8e44d1f02e

turtle89431 c50143bc82 Scrape wikipedia-science: 18125 new, 4386 updated, 23092 total (kb-cron)

2026-05-05 08:14:27 -07:00

3.8 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Category utility	5/5	https://en.wikipedia.org/wiki/Category_utility	reference	science, encyclopedia	2026-05-05T15:13:02.093903+00:00	kb-cron

=== Attempts at formalization === A variety of different measures have been suggested with an aim of formally capturing this notion of "category goodness," the best known of which is probably the "cue validity". Cue validity of a feature

      f
      
        i
      
    
     
  

{\displaystyle f_{i}\ }

with respect to category

      c
      
        j
      
    
     
  

{\displaystyle c_{j}\ }

is defined as the conditional probability of the category given the feature,

    p
    (
    
      c
      
        j
      
    
    
      |
    
    
      f
      
        i
      
    
    )
     
  

{\displaystyle p(c_{j}|f_{i})\ }

, or as the deviation of the conditional probability from the category base rate,

    p
    (
    
      c
      
        j
      
    
    
      |
    
    
      f
      
        i
      
    
    )
    −
    p
    (
    
      c
      
        j
      
    
    )
     
  

{\displaystyle p(c_{j}|f_{i})-p(c_{j})\ }

. Clearly, these measures quantify only inference from feature to category (i.e., cue validity), but not from category to feature, i.e., the category validity

    p
    (
    
      f
      
        i
      
    
    
      |
    
    
      c
      
        j
      
    
    )
     
  

{\displaystyle p(f_{i}|c_{j})\ }

. Also, while the cue validity was originally intended to account for the demonstrable appearance of basic categories in human cognition—categories of a particular level of generality that are evidently preferred by human learners—a number of major flaws in the cue validity quickly emerged in this regard. (and others) One attempt to address both problems by simultaneously maximizing both feature validity and category validity was made by Jones (1983) in defining the "collocation index" as the product

    p
    (
    
      c
      
        j
      
    
    
      |
    
    
      f
      
        i
      
    
    )
    p
    (
    
      f
      
        i
      
    
    
      |
    
    
      c
      
        j
      
    
    )
     
  

{\displaystyle p(c_{j}|f_{i})p(f_{i}|c_{j})\ }

, but this construction was fairly ad hoc. The category utility was introduced as a more sophisticated refinement of the cue validity, which attempts to more rigorously quantify the full inferential power of a class structure. As shown above, on a certain view the category utility is equivalent to the mutual information between the feature variable and the category variable. It has been suggested that categories having the greatest overall category utility are those that are not only those "best" in a normative sense, but also those human learners prefer to use, e.g., "basic" categories. Other related measures of category goodness are "cohesion" and "salience".

== Applications == Category utility is used as the category evaluation measure in the popular conceptual clustering algorithm called COBWEB.

== See also == Abstraction Concept learning Universals Unsupervised learning

== References ==

3.8 KiB Raw Blame History Unescape Escape

3.8 KiB

Raw Blame History