background image
26
Web Intelligence
Handle Dependency between Features - Some classifiers perform poorly when
features are (somewhat) dependent, in Web Intelligence problems where the
features are clicks in a click-stream they are likely to be (somewhat) de-
pendent, and this has to be handled by the classifier algorithm.
Naive Bayes is frequently being selected as the default classifier in Web In-
telligence; it has been used for e-mail spam detection, text classification, web
search and recommender systems. It is relatively scalable (mostly computation-
ally cheap table operations), but it uses a lot of memory in cases where features
are continuous or have many value levels. The memory requirements of Naive
Bayes can be reduced using discretization techniques, but this may result in re-
duced classification accuracy. Naive Bayes also assumes that feature values are
conditionally independent given the target value (Mitchell [1997]) . So how can
we develop classifiers that meet the above requirements?
3.3.7
Incremental Proximal Support Vector Classifiers
The theory of least-squares regression was first published by Adrien Marie Leg-
endre in 1805, and it was further developed into a statistical tool based on prob-
abilistic theory by Karl Friedrich Gauss in 1809, Christiani and Shawe-Taylor
[2000].
Least-squares regression performs poorly when the training vectors are non-
orthogonal, i.e. if two distinct vectors X
j
, X
k
satisfy the relation X
j
= a - bX
k
when a, b are scalars, Upton and Cook [2002]). Ridge regression was introduced
to deal with non-orthogonal least-square regression problems, Hoerl and Ken-
nard [1970].
The basic idea of ridge regression is to add a ridge parameter to
the diagonal of the matrix with training vectors in least-squares regression, if the
ridge parameter is nonzero it guarantees the orthogonality of the new matrix, i.e.
providing a non-singular regression system that can be solved. Ridge regression
is also considered a type of shrinkage regression.
Ridge regression is a specialization of Tikhonov Regularization using a square loss
function (Tikhonov [1963]; Tikhonov and Arsenin [1977]), this also explains the
synonym term for ridge regression - regularized least squares regression (RLSR).
Other specializations of Tikhonov regularization include support vector machine
regression (SVMR) (using an -insensitive loss function) and support vector ma-
chine classification (SVMC) (using a hinge loss function), Vapnik [1999]; Chris-
tiani and Shawe-Taylor [2000];
Rifkin [2002].
RLSR, SVMR and SVMC all support kernel mappings in order to handle non-
linear regression (RLSR,SVMR) and classification (SVMC) problems, the pur-
pose of a kernel is to map the nonlinear classification problem into a higher
dimensional space where it becomes a linear classification problem. Regularized

<< - < - > - >>