background image
The second step is to find optimal values for C and , where C is the penalty
parameter from the original SVM-formulation and is the constant in the Ra-
dial Basis Function Kernel. LIBSVM features a tool programmed in Python,
easy.py, that finds good values for C and These two steps described above is
time-consuming. Especially step two takes from five to ten minutes on a some-
what fast computer (AMD Athlon 1.66GHz), depending on the data set. In the
experiments, LIBSVM was tested both with the two preparing steps above and
without any preparing steps.
2.2
Experiments on UCI datasets
This section presents results from classification experiments performed on three
different datasets from the UCI Machine Learning Repository:
1. Waveform Generator database - Generated by a C-program. Each class
in the dataset is generated from a combination of " of 2 "base" waves with
added noise. The 5000 examples have 40 numeric attributes and 3 classes.
2. Image Segmentation Database - Data drawn randomly from a database
of seven outdoor images. The images were hand segmented to create a clas-
sification for every pixel. Each instance has 3x3 region and 19 attributes.
There are 7 classes in the dataset named brickface, sky, foliage, cement,
window, path and grass. Number of examples are 2310.
3. Letter Recognition Database - Data with 20,000 examples of black-and-
white rectangular pixels displays of the capital letters in the English alpha-
bet, hence the number of classes are 26, one for each letter. Each example
has 16 numerical features.
It was also planned to use the Forest Covertype data in the experiment, but
it was too memory intensive for Weka to handle.
Classification Accuracy - UCI Datasets As we can see from the results
in figure 1, MIPSVM performs comparably well when it comes to classifica-
tion accuracy for the Waveform and Image Segment datasets. For the Letter
Recognition dataset it performs considerably worse than the other classifiers.
This is likely to be caused by that MIPSVM doesn't have any balancing mecha-
nisms one-against-the-rest classifiers may gain from having when there are many
classes. Another reason could be that the letter dataset might be not be linearly
separable.
Computational Performance - UCI Datasets The computational efficiency
is measured in seconds of running time. The experiment was run on a AMD Dual
Athlon MP 2100+ (1.66GHz) with 2GB ram. Each configuration is run 10 times,
and the average running time is used as a result. The figure shows the runtime
relative to MIPSVM.
138
Empirical Comparison of MIPSVM with existing classifiers

<< - < - > - >>