README for the package of Rigorous Support Vector Machines (RSVM)
Author: Jinbo Bi (bij2@rpi.edu) 2/20/2003
This program provides a very preliminary implementation of Vapnik's Rigorous Support Vector Machines (RSVM)
(Vladimir N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc. 1998). A modification of SVM_light
or SVMTorch (or even SMO-like software) may produce a more efficient and complete implementation. This
program was aimed at providing a simple solver for RSVM in order to allow us to evalutate RSVM and compare
RSVM to classic SVMs from the perspective of generalization ability. The decomposition algorithm (chunking
scheme) developed for classic SVMs can be applied to solving large-scale RSVM with slight modifications.
Hence the existing solvers for classic SVMs can be adapted to RSVM by choosing a proper optimizer for
subproblems.
To solve the subproblem of smaller size in the decomposition scheme, we implemented in this program a gradient
descent method with each iteration finding a feasible descent direction and performing a line search along this
direction. Currently, this program works only with classification problems and kernel options are polynomials
and RBFs. Since the original goal was to generate a simple solver, the chunking scheme was wrapped on top of
the generated solver later, and thus not implemented very efficiently. We tested this program on the NIST OCR
database of hand-written digits (availabel at http://yann.lecun.com/exdb/mnist/index.html). It can run with
acceptable speed on data of size around 2000, and dimension about 1000. This program can be compiled on IBM
AIX 4.3 (using g++) SUN Solaris (using g++). With proper organization, it can also be compiled under Windows
Visual C++.
The RSVM.zip contains the following files:
Source Codes:
1. Makefile -- compile the C source codes
2. Rsvm.cpp -- contains the main function for training RSVM on training data.
3. Rsvmtest.cpp -- contains the main function for testing a RSVM model obtained by Rsvm.
4. chunk.cpp (chunk.h) -- contains chunking-related codes including termination criteria evaluation
5. direction.cpp (direction.h) -- codes for finding the searching direction (gradient descent)
6. linesearch.cpp (linesearch.h) -- codes related to the (exact or inexact) line search along the obtained searching direction
7. newsvm.cpp (newsvm.h) -- call functions in direction.cpp and linesearch.cpp to perform a gradient descent method to
solve RSVM
8. valuecomput.cpp (valuecomput.h) -- contains all required function evaluations, such as lagragian function evaluation,
gradient evaluation, etc required in all other cpp files.
9. kernel.cpp (kernel.h) -- compute or prepare the kernel matrix used in no-chunking or chunking mode.
10. getin.cpp (getin.h) -- I/O related functions to get in data or output models.
Sample data files:
11. toy.txt -- 9 examples in 2-dimensional space with 5 examples labelled as +1 and 4 examples labelled as -1.
12. toykernel.txt -- the kernel matrix computed as x_i'x_j+1 on the 9 examples in toy.txt.
13. toytest.txt -- 9 examples in 2-dimensional space with 5 examples in one class (+1) and 4 examples in the other class (-1).
Others:
14. README -- this README file
Installation:
Download RSVM.zip, and Using
unzip RSVM.zip
will generate a directory RSVM with all source codes and sample data in the directory.
1. Unix:
1.1 Executing
make all
generates Rsvm (for training) and Rsvmtest (for test) executable programs
and all objective files (with extension name .o)
1.2 Executing
make Rsvm
generates Rsvm executable file for training RSVM on training data and corresponding
intermediate files
1.3 Executing
make Rsvmtest
generates Rsvmtest executable file for testing an obtained model by running Rsvm
on test data, and corresponding intermediate files
1.4 Executing
make cc
eliminates all intermediate files (.o files) to save space
1.5 Executing
make clean
eliminates all intermediate files (.o files) and executable programs (Rsvm and Rsvmtest)
2. Windows Visual C++ 6.0:
1.1 Generate a project called Rsvm (Rsvm.dsw -- workspace file, Rsvm.dsp -- project file), and
add the Rsvm.cpp (main function), chunk.cpp, direction.cpp, getin.cpp kernel.cpp, linesearch.cpp,
newsvm.cpp and valuecomput.cpp files to the project. Go to Build and build the executable file
Rsvm.exe. You may have warnings of unreferenced variables.
1.2 Generate a project called Rsvmtest (Rsvmtest.dsw and Rsvmtest.dsp are related files), and add
the Rsvmtest.cpp, getin.cpp, kernel.cpp to the project. Build the executable file Rsvmtest.exe.
You may have warnings of unreferenced variables.
Program Usage:
The commands introduced here work under Unix.
1. Just use the command
Rsvm
without any argument, it will show the help information as follows:
Usage: Rsvm -i(-k) train_data [options] -o out_model -h VC-dim -t kernel_type -p kerpar -q chunk_size -I Iterations
The input train_data is an argument required
if -i option is used, train_data is the input data matrix
if -k option is used, train_data is the kernel matrix computed
on the input data
Other options have default values
-o output SVM model after training, default: no output
-h parameter for capacity control, the square root of VC
dimension, default: VC-dim=1.0
-t kernel type, 1: RBF, 2: polynomial, default: polynomial
-p parameter value used in kernel function, sigma in RBF,
or polynomial order, default: polynomial order=1.0
-q maximal size of subproblems if using chunking,
default: no chunking with subproblem size=0
-I the maximum number of chunking iterations,
default: maximum iteration=2000
2. Rsvm -i toy.txt -o toy.mod -h 2 -t 2 -p 1.0
This command runs Rsvm program on "toy.txt" data using polynomial kernel of
order 1.0 (i.e., linear kernel) and VC dim equal to 4, and then output RSVM
model into the file "toy.mod".
The input file with option -i should have the following format:
The first row contains 3 numbers: the total number of examples, the number of positive examples,
and the number of negative examples.
All other rows have the same format as
the first column -- example ID such as 1, 2, 3,... (currently it can only read in numbers).
the second column -- example labels (currently only for classification problems, so it is +1 or -1).
all other columns -- example attributes (should be numerical numbers).
A sample input data "toy.txt" look like:
9 5 4
1 1 -6.9122254e-01 7.2264196e-01
2 1 -9.7383536e-01 2.2725467e-01
3 1 9.9891263e-01 4.6621342e-02
4 1 8.9201459e-01 4.5200660e-01
5 1 1.1566454e-01 9.9328833e-01
6 -1 -7.1751210e-01 -6.9654604e-01
7 -1 1.2084689e-01 -9.9267116e-01
8 -1 9.9500000e-01 -1.0000000e-01
9 -1 -8.0124688e-01 -5.9833389e-01
On-screen results look like:
Rsvm -i toy.txt -o toy.mod -h 2 -t 2 -p 1.0
[toy.txt]
Number of features is 9
Number of positive points is 5
Number of negative points is 4
Number of Molecules is 9
Correctly read in data!
Chunking 1 ... ...
C_all= 1.206341e+00, intercept= 0.101296
W_value= 0.196671
Iter= 1 Norm = 2.165170 W_value = -0.505143
Iter= 2 Norm = 2.081954 W_value = -0.813631
Iter= 3 Norm = 2.283670 W_value = -1.045971
Iter= 4 Norm = 1.767290 W_value = -1.063918
Iter= 5 Norm = 1.428970 W_value = -1.308528
Iter= 6 Norm = 2.594916 W_value = -1.661511
Iter= 7 Norm = 1.326681 W_value = -1.688193
Iter= 8 Norm = 0.818350 W_value = -1.757644
Iter= 9 Norm = 0.957962 W_value = -1.824766
Iter= 10 Norm = 1.237811 W_value = -1.853912
Iter= 11 Norm = 0.333722 W_value = -1.868752
Iter= 12 Norm = 0.336182 W_value = -1.885240
Iter= 13 Norm = 0.333722 W_value = -1.901763
Iter= 14 Norm = 0.336182 W_value = -1.920120
Iter= 15 Norm = 0.333722 W_value = -1.938516
Iter= 16 Norm = 0.336182 W_value = -1.958953
Iter= 17 Norm = 0.333722 W_value = -1.979434
Iter= 18 Norm = 0.336182 W_value = -2.002188
Iter= 19 Norm = 0.333722 W_value = -2.024991
Iter= 20 Norm = 0.336182 W_value = -2.025256
Iter= 21 Norm = 0.306435 W_value = -2.026858
Iter= 22 Norm = 0.180370 W_value = -2.033201
Iter= 23 Norm = 0.000000 Gradient becomes zero
W_value = -2.033201
Chunking 2 ... ...
The optimal solution has been obtained!!!
Optimal, back to main function ...
Sum_alpha = 4.0950278277
C = 1.940027
largrange multip = 0.259166
intercept = -0.259166
Write out model file ...
3. Rsvm -k toykernel.txt -o toy.mod -h 2 -t 2 -p 1.0
This command performs the same functionality as the above command. The only
difference is that the input file is not a raw data file here but a kernel matrix in
the following format:
The first row contains 3 numbers: the total number of examples, the number of positive examples,
and the number of negative examples.
All other rows have the same format as
the first column -- example ID such as 1, 2, 3,... (currently it can only read in numbers).
the second column -- example labels (currently only for classification problems, so it is +1 or -1).
all other columns -- are corresponding columns of a kernel matrix
A sample input kernel file "toykernel.txt" look like:
9 5 4
1 1 2 1.83736 0.34322 0.710058 1.63784 -0.992607 -0.199122 -0.239969 -1.12146
2 1 1.83736 2 0.0378185 0.234045 1.11309 -1.54045 -0.656726 -0.00830835 -1.64431
3 1 0.34322 0.0378185 2 1.91212 1.16185 -0.250794 -1.07444 -1.98926 -0.171729
4 1 0.710058 0.234045 1.91212 2 1.55215 -0.0451253 -0.659103 -1.84235 -0.0148252
5 1 1.63784 1.11309 1.16185 1.55215 2 -0.225138 -0.027969 -1.01576 -0.313006
6 -1 -0.992607 -1.54045 -0.250794 -0.0451253 -0.225138 2 1.60473 0.35573 1.99167
7 -1 -0.199122 -0.656726 -1.07444 -0.659103 -0.027969 1.60473 2 1.21951 1.49712
8 -1 -0.239969 -0.00830835 -1.98926 -1.84235 -1.01576 0.35573 1.21951 2 0.262593
9 -1 -1.12146 -1.64431 -0.171729 -0.0148252 -0.313006 1.99167 1.49712 0.262593 2
On-screen results will be exactly the same as the one we pasted above corresponding to
the first command.
4. Rsvmtest toy.mod toy.txt toytest.txt toy.res
This command run the Rsvm test program. It reads in training data "toy.txt" and
test data "toytest.txt", then tests the obtained RSVM model "toy.mod" on "toytest.txt".
On-screen results look like:
Rsvmtest toy.mod toy.txt toytest.txt toy.res
[toy.txt]
Number of features is 9
Number of positive points is 5
Number of negative points is 4
Number of Molecules is 9
Correctly read in data!
Correctly read in model parameters
[toytest.txt]
Number of features is 9
Number of positive points is 5
Number of negative points is 4
Number of Molecules is 9
Correctly read in data!
1 1 1 1.82233 0
2 1 1 0.886069 0.113931
3 1 1 0.173331 0.826669
4 -1 1 0.999989 1.99999
5 1 1 2.21712 0
6 1 -1 -1.00001 2.00001
7 -1 -1 -1.73988 0
8 -1 -1 -1.02295 0
9 -1 -1 -0.789392 0.210608
Result files:
1. Rsvm generates two result files if the option -o is on. For example, after running Rsvm
on the "toy.txt" data, we obtain two files, "alpha.mod" and "toy.mod" (the user-defined name).
The "toy.mod" (a one-dimensional array) contains the coefficients corresponding to each
training example, followed by the intercept, an integer indicating which type of kernel
was used, and the value of the kernel related parameter.
The toy.mod file looks like:
0
1.94002705147
1.94002705147
0.0921782781527
0
0.0921782781527
0
1.94002705147
1.94002705147
-0.259166294177
2
1.0
The "alpha.mod" (a one-dimensional array) amounts to a log file which contains the running
information and the real dual solution alpha (the coefficients in "toy.mod" are alpha divided
by a scalar gamma, see Vapnik's Statistical Learning Theory).
The first integer indicates how the program terminates:
1 -- optimal solution has been obtained;
2 -- program stops, ill conditioned, H may be too large;
3 -- KKT conditions are not satisfied with required precision.
The second integer tells how many subproblems have been optimized within the last chunking iteration
The third integer provides the number of chunking iterations performed.
The fourth number is the value of alpha'*Q*alpha where Q is the matrix induced by the kernel
matrix and the vector of labels y, Q_ij = y_i*y_j*k(x_i,x_j).
All other numbers are dual solution alpha.
The alpha.mod file looks like:
1
22
1
1.0627825619e+00
0.000000
1.000000
1.000000
0.047514
0.000000
0.047514
0.000000
1.000000
1.000000
2. Rsvmtest generates a result file if the option is on. For example, after testing "toy.mod" on the
"toytest.txt" data, we obtain a file "toy.res" (if users specify this name, other names can be
used). The "toy.res" contains the predictions of test examples with each row corresponding to
a test example. The first number is the example ID. The second is the real label, and the thrid
number is the predicted label. The fourth number is the predicted value of the obtained function
and the fifth number is the value of the corresponding slack variable.
The toy.res file looks like:
1 1 1 1.8223 0
2 1 1 0.886046 0.113954
3 1 1 0.173358 0.826642
4 -1 1 1 2
5 1 1 2.21709 0
6 1 -1 -1 2
7 -1 -1 -1.73984 0
8 -1 -1 -1.02291 0
9 -1 -1 -0.789385 0.210615