kb/data/en.wikipedia.org/wiki/Category_utility-0.md

11 KiB
Raw Blame History

title chunk source category tags date_saved instance
Category utility 1/5 https://en.wikipedia.org/wiki/Category_utility reference science, encyclopedia 2026-05-05T15:13:02.093903+00:00 kb-cron

Category utility is a measure of "category goodness" defined in Gluck & Corter (1985) and Corter & Gluck (1992). It attempts to maximize both the probability that two objects in the same category have attribute values in common, and the probability that objects from different categories have different attribute values. It was intended to supersede more limited measures of category goodness such as "cue validity" and "collocation index". It provides a normative information-theoretic measure of the predictive advantage gained by the observer who possesses knowledge of the given category structure (i.e., the class labels of instances) over the observer who does not possess knowledge of the category structure. In this sense the motivation for the category utility measure is similar to the information gain metric used in decision tree learning. In certain presentations, it is also formally equivalent to the mutual information, as discussed below. A review of category utility in its probabilistic incarnation, with applications to machine learning, is provided in Witten and Frank's 2005 book.

== Probability-theoretic definition of category utility == The probability-theoretic definition of category utility given in Fisher (1987) and Witten and Frank (2005) is as follows:

    C
    U
    (
    C
    ,
    F
    )
    =
    
      
        
          1
          p
        
      
    
    
      ∑
      
        
          c
          
            j
          
        
        ∈
        C
      
    
    p
    (
    
      c
      
        j
      
    
    )
    
      [
      
        
          ∑
          
            
              f
              
                i
              
            
            ∈
            F
          
        
        
          ∑
          
            k
            =
            1
          
          
            m
          
        
        p
        (
        
          f
          
            i
            k
          
        
        
          |
        
        
          c
          
            j
          
        
        
          )
          
            2
          
        
        
        
          ∑
          
            
              f
              
                i
              
            
            ∈
            F
          
        
        
          ∑
          
            k
            =
            1
          
          
            m
          
        
        p
        (
        
          f
          
            i
            k
          
        
        
          )
          
            2
          
        
      
      ]
    
  

{\displaystyle CU(C,F)={\tfrac {1}{p}}\sum _{c_{j}\in C}p(c_{j})\left[\sum _{f_{i}\in F}\sum _{k=1}^{m}p(f_{ik}|c_{j})^{2}-\sum _{f_{i}\in F}\sum _{k=1}^{m}p(f_{ik})^{2}\right]}

where

    F
    =
    {
    
      f
      
        i
      
    
    }
    ,
     
    i
    =
    1
    …
    n
  

{\displaystyle F=\{f_{i}\},\ i=1\ldots n}

is a size-

    n
     
  

{\displaystyle n\ }

set of

    m
     
  

{\displaystyle m\ }

-ary features, and

    C
    =
    {
    
      c
      
        j
      
    
    }
     
    j
    =
    1
    …
    p
  

{\displaystyle C=\{c_{j}\}\ j=1\ldots p}

is a set of

    p
     
  

{\displaystyle p\ }

categories. The term

    p
    (
    
      f
      
        i
        k
      
    
    )
     
  

{\displaystyle p(f_{ik})\ }

designates the marginal probability that feature

      f
      
        i
      
    
     
  

{\displaystyle f_{i}\ }

takes on value

    k
     
  

{\displaystyle k\ }

, and the term

    p
    (
    
      f
      
        i
        k
      
    
    
      |
    
    
      c
      
        j
      
    
    )
     
  

{\displaystyle p(f_{ik}|c_{j})\ }

designates the category-conditional probability that feature

      f
      
        i
      
    
     
  

{\displaystyle f_{i}\ }

takes on value

    k
     
  

{\displaystyle k\ }

given that the object in question belongs to category

      c
      
        j
      
    
     
  

{\displaystyle c_{j}\ }

. The motivation and development of this expression for category utility, and the role of the multiplicand

            1
            p
          
        
      
    
  

{\displaystyle \textstyle {\tfrac {1}{p}}}

as a crude overfitting control, is given in the above sources. Loosely, the term

      p
      (
      
        c
        
          j
        
      
      )
      
        ∑
        
          
            f
            
              i
            
          
          ∈
          F
        
      
      
        ∑
        
          k
          =
          1
        
        
          m
        
      
      p
      (
      
        f
        
          i
          k
        
      
      
        |
      
      
        c
        
          j
        
      
      
        )
        
          2
        
      
    
  

{\displaystyle \textstyle p(c_{j})\sum _{f_{i}\in F}\sum _{k=1}^{m}p(f_{ik}|c_{j})^{2}}

is the expected number of attribute values that can be correctly guessed by an observer using a probability-matching strategy together with knowledge of the category labels, while

      p
      (
      
        c
        
          j
        
      
      )
      
        ∑
        
          
            f
            
              i
            
          
          ∈
          F
        
      
      
        ∑
        
          k
          =
          1
        
        
          m
        
      
      p
      (
      
        f
        
          i
          k
        
      
      
        )
        
          2
        
      
    
  

{\displaystyle \textstyle p(c_{j})\sum _{f_{i}\in F}\sum _{k=1}^{m}p(f_{ik})^{2}}

is the expected number of attribute values that can be correctly guessed by an observer the same strategy but without any knowledge of the category labels. Their difference therefore reflects the relative advantage accruing to the observer by having knowledge of the category structure.

== Information-theoretic definition of category utility == The information-theoretic definition of category utility for a set of entities with size-

    n
     
  

{\displaystyle n\ }

binary feature set

    F
    =
    {
    
      f
      
        i
      
    
    }
    ,
     
    i
    =
    1
    …
    n
  

{\displaystyle F=\{f_{i}\},\ i=1\ldots n}

, and a binary category

    C
    =
    {
    c
    ,
    
      
        
          c
          ¯
        
      
    
    }
  

{\displaystyle C=\{c,{\bar {c}}\}}

is given in Gluck & Corter (1985) as follows:

    C
    U
    (
    C
    ,
    F
    )
    =
    
      [
      
        p
        (
        c
        )
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        p
        (
        
          f
          
            i
          
        
        
          |
        
        c
        )
        log
        
        p
        (
        
          f
          
            i
          
        
        
          |
        
        c
        )
        +
        p
        (
        
          
            
              c
              ¯
            
          
        
        )
        
          ∑
          
            i
            =
            1
          
          
            n
          
        
        p
        (
        
          f
          
            i
          
        
        
          |
        
        
          
            
              c
              ¯
            
          
        
        )
        log
        
        p
        (
        
          f
          
            i
          
        
        
          |
        
        
          
            
              c
              ¯
            
          
        
        )
      
      ]
    
    
    
      ∑
      
        i
        =
        1
      
      
        n
      
    
    p
    (
    
      f
      
        i
      
    
    )
    log
    
    p
    (
    
      f
      
        i
      
    
    )
  

{\displaystyle CU(C,F)=\left[p(c)\sum _{i=1}^{n}p(f_{i}|c)\log p(f_{i}|c)+p({\bar {c}})\sum _{i=1}^{n}p(f_{i}|{\bar {c}})\log p(f_{i}|{\bar {c}})\right]-\sum _{i=1}^{n}p(f_{i})\log p(f_{i})}