RSVMreadme | Jinbo Bi

README for the package of Rigorous Support Vector Machines (RSVM)
Author: Jinbo Bi  (bij2@rpi.edu) 2/20/2003

This program provides a very preliminary implementation of Vapnik's Rigorous Support Vector Machines (RSVM)
(Vladimir N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc. 1998). A modification of SVM_light 
or SVMTorch (or even SMO-like software) may produce a more efficient and complete implementation.  This 
program was aimed at providing a simple solver for RSVM in order to allow us to evalutate RSVM and compare 
RSVM to classic SVMs from the perspective of generalization ability.  The decomposition algorithm (chunking 
scheme) developed for classic SVMs can be applied to solving large-scale RSVM with slight modifications. 
Hence the existing solvers for classic SVMs can be adapted to RSVM by choosing a proper optimizer for 
subproblems.

To solve the subproblem of smaller size in the decomposition scheme, we implemented in this program a gradient 
descent method with each iteration finding a feasible descent direction and performing a line search along this 
direction. Currently, this program works only with classification problems and kernel options are polynomials 
and RBFs. Since the original goal was to generate a simple solver, the chunking scheme was wrapped on top of 
the generated solver later, and thus not implemented very efficiently.  We tested this program on the NIST OCR 
database of hand-written digits (availabel at http://yann.lecun.com/exdb/mnist/index.html). It can run with 
acceptable speed on data of size around 2000, and dimension about 1000. This program can be compiled on IBM 
AIX 4.3 (using g++) SUN Solaris (using g++). With proper organization, it can also be compiled under Windows 
Visual C++.

The RSVM.zip contains the following files:
Source Codes:
	1. Makefile -- compile the C source codes
	2. Rsvm.cpp -- contains the main function for training RSVM on training data.
	3. Rsvmtest.cpp -- contains the main function for testing a RSVM model obtained by Rsvm.
	4. chunk.cpp (chunk.h) -- contains chunking-related codes including termination criteria evaluation
	5. direction.cpp (direction.h) -- codes for finding the searching direction (gradient descent)
	6. linesearch.cpp (linesearch.h) -- codes related to the (exact or inexact) line search along the obtained searching direction
	7. newsvm.cpp (newsvm.h) -- call functions in direction.cpp and linesearch.cpp to perform a gradient descent method to
	   solve RSVM
	8. valuecomput.cpp (valuecomput.h) -- contains all required function evaluations, such as lagragian function evaluation,
	   gradient evaluation, etc required in all other cpp files.
	9. kernel.cpp (kernel.h) -- compute or prepare the kernel matrix used in no-chunking or chunking mode.
	10. getin.cpp (getin.h) -- I/O related functions to get in data or output models.
Sample data files:
	11. toy.txt -- 9 examples in 2-dimensional space with 5 examples labelled as +1 and 4 examples labelled as -1.
	12. toykernel.txt -- the kernel matrix computed as x_i'x_j+1 on the 9 examples in toy.txt.
	13. toytest.txt -- 9 examples in 2-dimensional space with 5 examples in one class (+1) and 4 examples in the other class (-1).
Others:
	14. README -- this README file
	

Installation:

Download RSVM.zip, and Using 
	unzip RSVM.zip
will generate a directory RSVM with all source codes and sample data in the directory.

1. Unix:
	1.1 Executing
		make all
		generates Rsvm (for training) and Rsvmtest (for test) executable programs
		and all objective files (with extension name .o)
	1.2 Executing
		make Rsvm
		generates Rsvm executable file for training RSVM on training data and corresponding
		intermediate files
	1.3 Executing
		make Rsvmtest
		generates Rsvmtest executable file for testing an obtained model by running Rsvm
		on test data, and corresponding intermediate files
	1.4 Executing
		make cc
		eliminates all intermediate files (.o files) to save space
	1.5 Executing
		make clean
		eliminates all intermediate files (.o files) and executable programs (Rsvm and Rsvmtest)
2. Windows Visual C++ 6.0:
	1.1 Generate a project called Rsvm (Rsvm.dsw -- workspace file, Rsvm.dsp -- project file), and
	    add the Rsvm.cpp (main function), chunk.cpp, direction.cpp, getin.cpp kernel.cpp, linesearch.cpp, 
	    newsvm.cpp and valuecomput.cpp files to the project. Go to Build and build the executable file 
	    Rsvm.exe. You may have warnings of unreferenced variables.
	1.2 Generate a project called Rsvmtest (Rsvmtest.dsw and Rsvmtest.dsp are related files), and add
	    the Rsvmtest.cpp, getin.cpp, kernel.cpp to the project. Build the executable file Rsvmtest.exe.
	    You may have warnings of unreferenced variables.
	    
	    
Program Usage:

The commands introduced here work under Unix.

1. Just use the command
	Rsvm
   without any argument, it will show the help information as follows:
   
Usage: Rsvm -i(-k) train_data [options] -o out_model -h VC-dim -t kernel_type -p kerpar -q chunk_size -I Iterations

       The input train_data is an argument required
       if -i option is used, train_data is the input data matrix
       if -k option is used, train_data is the kernel matrix computed
             on the input data

       Other options have default values
       -o    output SVM model after training, default: no output
       -h    parameter for capacity control, the square root of VC
             dimension, default: VC-dim=1.0
       -t    kernel type, 1: RBF, 2: polynomial, default: polynomial
       -p    parameter value used in kernel function, sigma in RBF,
             or polynomial order, default: polynomial order=1.0
       -q    maximal size of subproblems if using chunking,
             default: no chunking with subproblem size=0
       -I    the maximum number of chunking iterations,
             default: maximum iteration=2000

2.	Rsvm -i toy.txt -o toy.mod -h 2 -t 2 -p 1.0
This command runs Rsvm program on "toy.txt" data using polynomial kernel of
order 1.0 (i.e., linear kernel) and VC dim equal to 4, and then output RSVM
model into the file "toy.mod".

The input file with option -i should have the following format:
The first row contains 3 numbers: the total number of examples, the number of positive examples,
and the number of negative examples.
All other rows have the same format as
the first column -- example ID such as 1, 2, 3,... (currently it can only read in numbers).
the second column -- example labels (currently only for classification problems, so it is +1 or -1).
all other columns -- example attributes (should be numerical numbers).

A sample input data "toy.txt" look like:
9       5       4
1   1  -6.9122254e-01   7.2264196e-01
2   1  -9.7383536e-01   2.2725467e-01
3   1   9.9891263e-01   4.6621342e-02
4   1   8.9201459e-01   4.5200660e-01
5   1   1.1566454e-01   9.9328833e-01
6  -1  -7.1751210e-01  -6.9654604e-01
7  -1   1.2084689e-01  -9.9267116e-01
8  -1   9.9500000e-01  -1.0000000e-01
9  -1  -8.0124688e-01  -5.9833389e-01

On-screen results look like:
Rsvm -i toy.txt -o toy.mod -h 2 -t 2 -p 1.0
[toy.txt]
Number of features is 9
Number of positive points is 5
Number of negative points is 4
Number of Molecules is 9
Correctly read in data!

Chunking 1 ... ...
C_all= 1.206341e+00, intercept= 0.101296
W_value= 0.196671 
Iter= 1  Norm = 2.165170  W_value = -0.505143
Iter= 2  Norm = 2.081954  W_value = -0.813631
Iter= 3  Norm = 2.283670  W_value = -1.045971
Iter= 4  Norm = 1.767290  W_value = -1.063918
Iter= 5  Norm = 1.428970  W_value = -1.308528
Iter= 6  Norm = 2.594916  W_value = -1.661511
Iter= 7  Norm = 1.326681  W_value = -1.688193
Iter= 8  Norm = 0.818350  W_value = -1.757644
Iter= 9  Norm = 0.957962  W_value = -1.824766
Iter= 10  Norm = 1.237811  W_value = -1.853912
Iter= 11  Norm = 0.333722  W_value = -1.868752
Iter= 12  Norm = 0.336182  W_value = -1.885240
Iter= 13  Norm = 0.333722  W_value = -1.901763
Iter= 14  Norm = 0.336182  W_value = -1.920120
Iter= 15  Norm = 0.333722  W_value = -1.938516
Iter= 16  Norm = 0.336182  W_value = -1.958953
Iter= 17  Norm = 0.333722  W_value = -1.979434
Iter= 18  Norm = 0.336182  W_value = -2.002188
Iter= 19  Norm = 0.333722  W_value = -2.024991
Iter= 20  Norm = 0.336182  W_value = -2.025256
Iter= 21  Norm = 0.306435  W_value = -2.026858
Iter= 22  Norm = 0.180370  W_value = -2.033201
Iter= 23  Norm = 0.000000  Gradient becomes zero
W_value = -2.033201  
Chunking 2 ... ...
The optimal solution has been obtained!!!

Optimal, back to main function ...
Sum_alpha = 4.0950278277
C = 1.940027 
largrange multip = 0.259166
intercept = -0.259166
Write out model file ...


3.	Rsvm -k toykernel.txt -o toy.mod -h 2 -t 2 -p 1.0
This command performs the same functionality as the above command. The only
difference is that the input file is not a raw data file here but a kernel matrix in
the following format:
The first row contains 3 numbers: the total number of examples, the number of positive examples,
and the number of negative examples.
All other rows have the same format as
the first column -- example ID such as 1, 2, 3,... (currently it can only read in numbers).
the second column -- example labels (currently only for classification problems, so it is +1 or -1).
all other columns -- are corresponding columns of a kernel matrix

A sample input kernel file "toykernel.txt" look like:
9  5  4
1  1  2  1.83736  0.34322  0.710058  1.63784  -0.992607  -0.199122  -0.239969  -1.12146  
2  1  1.83736  2  0.0378185  0.234045  1.11309  -1.54045  -0.656726  -0.00830835  -1.64431  
3  1  0.34322  0.0378185  2  1.91212  1.16185  -0.250794  -1.07444  -1.98926  -0.171729  
4  1  0.710058  0.234045  1.91212  2  1.55215  -0.0451253  -0.659103  -1.84235  -0.0148252  
5  1  1.63784  1.11309  1.16185  1.55215  2  -0.225138  -0.027969  -1.01576  -0.313006  
6 -1  -0.992607  -1.54045  -0.250794  -0.0451253  -0.225138  2  1.60473  0.35573  1.99167  
7 -1  -0.199122  -0.656726  -1.07444  -0.659103  -0.027969  1.60473  2  1.21951  1.49712  
8 -1  -0.239969  -0.00830835  -1.98926  -1.84235  -1.01576  0.35573  1.21951  2  0.262593  
9 -1  -1.12146  -1.64431  -0.171729  -0.0148252  -0.313006  1.99167  1.49712  0.262593  2 

On-screen results will be exactly the same as the one we pasted above corresponding to
the first command.


4.	Rsvmtest toy.mod toy.txt toytest.txt toy.res
This command run the Rsvm test program. It reads in training data "toy.txt" and
test data "toytest.txt", then tests the obtained RSVM model "toy.mod" on "toytest.txt".

On-screen results look like:
Rsvmtest toy.mod toy.txt toytest.txt toy.res
[toy.txt]
Number of features is 9
Number of positive points is 5
Number of negative points is 4
Number of Molecules is 9
Correctly read in data!

Correctly read in model parameters
[toytest.txt]
Number of features is 9
Number of positive points is 5
Number of negative points is 4
Number of Molecules is 9
Correctly read in data!

1  1  1  1.82233  0
2  1  1  0.886069  0.113931
3  1  1  0.173331  0.826669
4  -1  1  0.999989  1.99999
5  1  1  2.21712  0
6  1  -1  -1.00001  2.00001
7  -1  -1  -1.73988  0
8  -1  -1  -1.02295  0
9  -1  -1  -0.789392  0.210608


Result files:

	1. Rsvm generates two result files if the option -o is on.  For example, after running Rsvm
	   on the "toy.txt" data, we obtain two files, "alpha.mod" and "toy.mod" (the user-defined name).

	   The "toy.mod" (a one-dimensional array) contains the coefficients corresponding to each 
	   training example, followed by the intercept, an integer indicating which type of kernel 
	   was used, and the value of the kernel related parameter.
	   
	   The toy.mod file looks like:
	              0
       1.94002705147
       1.94002705147
     0.0921782781527
                   0
     0.0921782781527
                   0
       1.94002705147
       1.94002705147
     -0.259166294177
                   2
                 1.0

	   The "alpha.mod" (a one-dimensional array) amounts to a log file which contains the running 
	   information and the real dual solution alpha (the coefficients in "toy.mod" are alpha divided 
	   by a scalar gamma, see Vapnik's Statistical Learning Theory).  
	   
	   The first integer indicates how the program terminates:
	   1 -- optimal solution has been obtained;
	   2 -- program stops, ill conditioned, H may be too large;
	   3 -- KKT conditions are not satisfied with required precision.
	   
	   The second integer tells how many subproblems have been optimized within the last chunking iteration
	   
	   The third integer provides the number of chunking iterations performed.
	   
	   The fourth number is the value of alpha'*Q*alpha where Q is the matrix induced by the kernel
	   matrix and the vector of labels y, Q_ij = y_i*y_j*k(x_i,x_j).
	   
	   All other numbers are dual solution alpha.
	   
	   The alpha.mod file looks like:
	   1
	   22
	   1
	   1.0627825619e+00
	   0.000000
	   1.000000
	   1.000000
	   0.047514
	   0.000000
	   0.047514
	   0.000000
	   1.000000
	   1.000000
	   
	2. Rsvmtest generates a result file if the option is on. For example, after testing "toy.mod" on the
	   "toytest.txt" data, we obtain a file "toy.res" (if users specify this name, other names can be
	   used).  The "toy.res" contains the predictions of test examples with each row corresponding to 
	   a test example.  The first number is the example ID. The second is the real label, and the thrid 
	   number is the predicted label. The fourth number is the predicted value of the obtained function 
	   and the fifth number is the value of the corresponding slack variable. 
	   
	   The toy.res file looks like:
	   1  1  1  1.8223  0
	   2  1  1  0.886046  0.113954
	   3  1  1  0.173358  0.826642
	   4  -1  1  1  2
	   5  1  1  2.21709  0
	   6  1  -1  -1  2
	   7  -1  -1  -1.73984  0
	   8  -1  -1  -1.02291  0
	   9  -1  -1  -0.789385  0.210615