README for the package of Rigorous Support Vector Machines (RSVM) Author: Jinbo Bi (bij2@rpi.edu) 2/20/2003 This program provides a very preliminary implementation of Vapnik's Rigorous Support Vector Machines (RSVM) (Vladimir N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc. 1998). A modification of SVM_light or SVMTorch (or even SMO-like software) may produce a more efficient and complete implementation. This program was aimed at providing a simple solver for RSVM in order to allow us to evalutate RSVM and compare RSVM to classic SVMs from the perspective of generalization ability. The decomposition algorithm (chunking scheme) developed for classic SVMs can be applied to solving large-scale RSVM with slight modifications. Hence the existing solvers for classic SVMs can be adapted to RSVM by choosing a proper optimizer for subproblems. To solve the subproblem of smaller size in the decomposition scheme, we implemented in this program a gradient descent method with each iteration finding a feasible descent direction and performing a line search along this direction. Currently, this program works only with classification problems and kernel options are polynomials and RBFs. Since the original goal was to generate a simple solver, the chunking scheme was wrapped on top of the generated solver later, and thus not implemented very efficiently. We tested this program on the NIST OCR database of hand-written digits (availabel at http://yann.lecun.com/exdb/mnist/index.html). It can run with acceptable speed on data of size around 2000, and dimension about 1000. This program can be compiled on IBM AIX 4.3 (using g++) SUN Solaris (using g++). With proper organization, it can also be compiled under Windows Visual C++. The RSVM.zip contains the following files: Source Codes: 1. Makefile -- compile the C source codes 2. Rsvm.cpp -- contains the main function for training RSVM on training data. 3. Rsvmtest.cpp -- contains the main function for testing a RSVM model obtained by Rsvm. 4. chunk.cpp (chunk.h) -- contains chunking-related codes including termination criteria evaluation 5. direction.cpp (direction.h) -- codes for finding the searching direction (gradient descent) 6. linesearch.cpp (linesearch.h) -- codes related to the (exact or inexact) line search along the obtained searching direction 7. newsvm.cpp (newsvm.h) -- call functions in direction.cpp and linesearch.cpp to perform a gradient descent method to solve RSVM 8. valuecomput.cpp (valuecomput.h) -- contains all required function evaluations, such as lagragian function evaluation, gradient evaluation, etc required in all other cpp files. 9. kernel.cpp (kernel.h) -- compute or prepare the kernel matrix used in no-chunking or chunking mode. 10. getin.cpp (getin.h) -- I/O related functions to get in data or output models. Sample data files: 11. toy.txt -- 9 examples in 2-dimensional space with 5 examples labelled as +1 and 4 examples labelled as -1. 12. toykernel.txt -- the kernel matrix computed as x_i'x_j+1 on the 9 examples in toy.txt. 13. toytest.txt -- 9 examples in 2-dimensional space with 5 examples in one class (+1) and 4 examples in the other class (-1). Others: 14. README -- this README file Installation: Download RSVM.zip, and Using unzip RSVM.zip will generate a directory RSVM with all source codes and sample data in the directory. 1. Unix: 1.1 Executing make all generates Rsvm (for training) and Rsvmtest (for test) executable programs and all objective files (with extension name .o) 1.2 Executing make Rsvm generates Rsvm executable file for training RSVM on training data and corresponding intermediate files 1.3 Executing make Rsvmtest generates Rsvmtest executable file for testing an obtained model by running Rsvm on test data, and corresponding intermediate files 1.4 Executing make cc eliminates all intermediate files (.o files) to save space 1.5 Executing make clean eliminates all intermediate files (.o files) and executable programs (Rsvm and Rsvmtest) 2. Windows Visual C++ 6.0: 1.1 Generate a project called Rsvm (Rsvm.dsw -- workspace file, Rsvm.dsp -- project file), and add the Rsvm.cpp (main function), chunk.cpp, direction.cpp, getin.cpp kernel.cpp, linesearch.cpp, newsvm.cpp and valuecomput.cpp files to the project. Go to Build and build the executable file Rsvm.exe. You may have warnings of unreferenced variables. 1.2 Generate a project called Rsvmtest (Rsvmtest.dsw and Rsvmtest.dsp are related files), and add the Rsvmtest.cpp, getin.cpp, kernel.cpp to the project. Build the executable file Rsvmtest.exe. You may have warnings of unreferenced variables. Program Usage: The commands introduced here work under Unix. 1. Just use the command Rsvm without any argument, it will show the help information as follows: Usage: Rsvm -i(-k) train_data [options] -o out_model -h VC-dim -t kernel_type -p kerpar -q chunk_size -I Iterations The input train_data is an argument required if -i option is used, train_data is the input data matrix if -k option is used, train_data is the kernel matrix computed on the input data Other options have default values -o output SVM model after training, default: no output -h parameter for capacity control, the square root of VC dimension, default: VC-dim=1.0 -t kernel type, 1: RBF, 2: polynomial, default: polynomial -p parameter value used in kernel function, sigma in RBF, or polynomial order, default: polynomial order=1.0 -q maximal size of subproblems if using chunking, default: no chunking with subproblem size=0 -I the maximum number of chunking iterations, default: maximum iteration=2000 2. Rsvm -i toy.txt -o toy.mod -h 2 -t 2 -p 1.0 This command runs Rsvm program on "toy.txt" data using polynomial kernel of order 1.0 (i.e., linear kernel) and VC dim equal to 4, and then output RSVM model into the file "toy.mod". The input file with option -i should have the following format: The first row contains 3 numbers: the total number of examples, the number of positive examples, and the number of negative examples. All other rows have the same format as the first column -- example ID such as 1, 2, 3,... (currently it can only read in numbers). the second column -- example labels (currently only for classification problems, so it is +1 or -1). all other columns -- example attributes (should be numerical numbers). A sample input data "toy.txt" look like: 9 5 4 1 1 -6.9122254e-01 7.2264196e-01 2 1 -9.7383536e-01 2.2725467e-01 3 1 9.9891263e-01 4.6621342e-02 4 1 8.9201459e-01 4.5200660e-01 5 1 1.1566454e-01 9.9328833e-01 6 -1 -7.1751210e-01 -6.9654604e-01 7 -1 1.2084689e-01 -9.9267116e-01 8 -1 9.9500000e-01 -1.0000000e-01 9 -1 -8.0124688e-01 -5.9833389e-01 On-screen results look like: Rsvm -i toy.txt -o toy.mod -h 2 -t 2 -p 1.0 [toy.txt] Number of features is 9 Number of positive points is 5 Number of negative points is 4 Number of Molecules is 9 Correctly read in data! Chunking 1 ... ... C_all= 1.206341e+00, intercept= 0.101296 W_value= 0.196671 Iter= 1 Norm = 2.165170 W_value = -0.505143 Iter= 2 Norm = 2.081954 W_value = -0.813631 Iter= 3 Norm = 2.283670 W_value = -1.045971 Iter= 4 Norm = 1.767290 W_value = -1.063918 Iter= 5 Norm = 1.428970 W_value = -1.308528 Iter= 6 Norm = 2.594916 W_value = -1.661511 Iter= 7 Norm = 1.326681 W_value = -1.688193 Iter= 8 Norm = 0.818350 W_value = -1.757644 Iter= 9 Norm = 0.957962 W_value = -1.824766 Iter= 10 Norm = 1.237811 W_value = -1.853912 Iter= 11 Norm = 0.333722 W_value = -1.868752 Iter= 12 Norm = 0.336182 W_value = -1.885240 Iter= 13 Norm = 0.333722 W_value = -1.901763 Iter= 14 Norm = 0.336182 W_value = -1.920120 Iter= 15 Norm = 0.333722 W_value = -1.938516 Iter= 16 Norm = 0.336182 W_value = -1.958953 Iter= 17 Norm = 0.333722 W_value = -1.979434 Iter= 18 Norm = 0.336182 W_value = -2.002188 Iter= 19 Norm = 0.333722 W_value = -2.024991 Iter= 20 Norm = 0.336182 W_value = -2.025256 Iter= 21 Norm = 0.306435 W_value = -2.026858 Iter= 22 Norm = 0.180370 W_value = -2.033201 Iter= 23 Norm = 0.000000 Gradient becomes zero W_value = -2.033201 Chunking 2 ... ... The optimal solution has been obtained!!! Optimal, back to main function ... Sum_alpha = 4.0950278277 C = 1.940027 largrange multip = 0.259166 intercept = -0.259166 Write out model file ... 3. Rsvm -k toykernel.txt -o toy.mod -h 2 -t 2 -p 1.0 This command performs the same functionality as the above command. The only difference is that the input file is not a raw data file here but a kernel matrix in the following format: The first row contains 3 numbers: the total number of examples, the number of positive examples, and the number of negative examples. All other rows have the same format as the first column -- example ID such as 1, 2, 3,... (currently it can only read in numbers). the second column -- example labels (currently only for classification problems, so it is +1 or -1). all other columns -- are corresponding columns of a kernel matrix A sample input kernel file "toykernel.txt" look like: 9 5 4 1 1 2 1.83736 0.34322 0.710058 1.63784 -0.992607 -0.199122 -0.239969 -1.12146 2 1 1.83736 2 0.0378185 0.234045 1.11309 -1.54045 -0.656726 -0.00830835 -1.64431 3 1 0.34322 0.0378185 2 1.91212 1.16185 -0.250794 -1.07444 -1.98926 -0.171729 4 1 0.710058 0.234045 1.91212 2 1.55215 -0.0451253 -0.659103 -1.84235 -0.0148252 5 1 1.63784 1.11309 1.16185 1.55215 2 -0.225138 -0.027969 -1.01576 -0.313006 6 -1 -0.992607 -1.54045 -0.250794 -0.0451253 -0.225138 2 1.60473 0.35573 1.99167 7 -1 -0.199122 -0.656726 -1.07444 -0.659103 -0.027969 1.60473 2 1.21951 1.49712 8 -1 -0.239969 -0.00830835 -1.98926 -1.84235 -1.01576 0.35573 1.21951 2 0.262593 9 -1 -1.12146 -1.64431 -0.171729 -0.0148252 -0.313006 1.99167 1.49712 0.262593 2 On-screen results will be exactly the same as the one we pasted above corresponding to the first command. 4. Rsvmtest toy.mod toy.txt toytest.txt toy.res This command run the Rsvm test program. It reads in training data "toy.txt" and test data "toytest.txt", then tests the obtained RSVM model "toy.mod" on "toytest.txt". On-screen results look like: Rsvmtest toy.mod toy.txt toytest.txt toy.res [toy.txt] Number of features is 9 Number of positive points is 5 Number of negative points is 4 Number of Molecules is 9 Correctly read in data! Correctly read in model parameters [toytest.txt] Number of features is 9 Number of positive points is 5 Number of negative points is 4 Number of Molecules is 9 Correctly read in data! 1 1 1 1.82233 0 2 1 1 0.886069 0.113931 3 1 1 0.173331 0.826669 4 -1 1 0.999989 1.99999 5 1 1 2.21712 0 6 1 -1 -1.00001 2.00001 7 -1 -1 -1.73988 0 8 -1 -1 -1.02295 0 9 -1 -1 -0.789392 0.210608 Result files: 1. Rsvm generates two result files if the option -o is on. For example, after running Rsvm on the "toy.txt" data, we obtain two files, "alpha.mod" and "toy.mod" (the user-defined name). The "toy.mod" (a one-dimensional array) contains the coefficients corresponding to each training example, followed by the intercept, an integer indicating which type of kernel was used, and the value of the kernel related parameter. The toy.mod file looks like: 0 1.94002705147 1.94002705147 0.0921782781527 0 0.0921782781527 0 1.94002705147 1.94002705147 -0.259166294177 2 1.0 The "alpha.mod" (a one-dimensional array) amounts to a log file which contains the running information and the real dual solution alpha (the coefficients in "toy.mod" are alpha divided by a scalar gamma, see Vapnik's Statistical Learning Theory). The first integer indicates how the program terminates: 1 -- optimal solution has been obtained; 2 -- program stops, ill conditioned, H may be too large; 3 -- KKT conditions are not satisfied with required precision. The second integer tells how many subproblems have been optimized within the last chunking iteration The third integer provides the number of chunking iterations performed. The fourth number is the value of alpha'*Q*alpha where Q is the matrix induced by the kernel matrix and the vector of labels y, Q_ij = y_i*y_j*k(x_i,x_j). All other numbers are dual solution alpha. The alpha.mod file looks like: 1 22 1 1.0627825619e+00 0.000000 1.000000 1.000000 0.047514 0.000000 0.047514 0.000000 1.000000 1.000000 2. Rsvmtest generates a result file if the option is on. For example, after testing "toy.mod" on the "toytest.txt" data, we obtain a file "toy.res" (if users specify this name, other names can be used). The "toy.res" contains the predictions of test examples with each row corresponding to a test example. The first number is the example ID. The second is the real label, and the thrid number is the predicted label. The fourth number is the predicted value of the obtained function and the fifth number is the value of the corresponding slack variable. The toy.res file looks like: 1 1 1 1.8223 0 2 1 1 0.886046 0.113954 3 1 1 0.173358 0.826642 4 -1 1 1 2 5 1 1 2.21709 0 6 1 -1 -1 2 7 -1 -1 -1.73984 0 8 -1 -1 -1.02291 0 9 -1 -1 -0.789385 0.210615