|
Rule Induction
Rule induction is the process of looking at a data set and generating patterns. By automatically exploring the data set, as in Figure 5, the induction system forms hypotheses that lead to patterns.
Figure 5.
The process is in essence similar to what a human analyst would do in exploratory analysis. For example, given a database of demographic information, the induction system may first look at how ages are distributed, and it may notice an interesting variation for those people whose profession is listed as professional athlete. This hypothesis is then found to be relevant and the system will print a rule such as:
IF Profession = Athlete THEN Age < 30
This rule may have a "confidence" of 70% attached to it. However, this pattern may not hold for the ages of bankers or teachers in the same database.
We must also distinguish between fuzzy and inexact rules. Inexact rules often have a "fixed" confidence factor attached to them, i.e. each rule has a specific integer or percentage (such as 70%) representing its validity. However, the confidence in a fuzzy rules can vary in terms of the numeric values in the body of the rule; for instance the confidence may be proportional to the age of a person and as the age varies so does the confidence. In this way fuzzy rules can produce much more compact expressions of knowledge and lead to stable behavior.
Rule induction can discover very general rules which deal with both numeric and non-numeric data. And rules can combine conditional and affinity statements into hybrid patterns. A key issue here is the ability to go beyond flat databases and deal with OLAP patterns (Parsaye, 1997).
Copyright (C) 1997, Journal of Data Warehousing, December 1997 |