|
Equational Approaches
The underlying method of pattern expression in these systems is "surface construction" rather than logical expression or co-occurence counts. Such systems usually use a set of equations to define a "surface" within a numeric space, then measure distances from this surface for prediction.
The best known example of such a surface is a straight line in a two dimensional space, as in Figure 10a. This has the simple equation Y = (a * X) + b and leads to the well known approach of linear regression in statistics. As the parameter "a" varies in this equation, the slope of the line changes.
a) Regression line
Y = (a * X) + b
b) Parabolic equation
Y = X 2
c) Inverse equation
Y = 1 / X
Figure 10.
Regression works well when the points to be approximated lie on a straight line. But as in Figures 10b and 10c it is also possible to use non-linear equations to approximate smoother surfaces.
When the surfaces are even more complex (e.g., Y = (X 2 + X + (1 / X))), or when there are several dimensions, the ability of humans to understand the equations and surfaces decreases rather quickly. The system becomes opaque or "black-box". However, it is still possible to construct such surfaces.
In fact, neural nets are known to be "universal approximators" in theory. They can come very close to any function. However, present theory does not specify the practical limits of nets for achieving such approximation on large data sets and most neural net implementations rely on sampling.
The equational approaches almost always require the data set to be all numeric. Non-numeric data needs to be "coded" into numbers (the reverse of what cross-tabs do). This often causes a number of problems, as discussed below.
Copyright (C) 1997, Journal of Data Warehousing, December 1997 |