```README for the MatLab package of Regression Error Characteristic Curves
Jinbo Bi (bij2@rpi.edu) 6/4/2003
Kristin Bennett (bennek@rpi.edu)

This package requires MatLab and MatLab Statistics toolbox in order to
run the functions included.
This package includes 4 MatLab .m files: CDF.m, rec_curve.m, syndata.m
and sample_curves.m (including two functions sample_curves and draw_syn_curves).
The hierarchy of these functions is as follows:
1) function CDF is the very basic function which is called by function
rec_curve to estimate the cumulative distribution function.
2) function syndata is a stand-alone function for generating synthetic
3) function rec_curve is the core program for plotting a REC curve.
4) function draw_syn_curves calls function rec_curve to plot various
REC curves.
5) function sample_curves calls function syndata to generate Gaussian
noise data and Laplacian noise data, and calls function draw_syn_curves
to draw REC curves for 4 different models on both data sets.  See our
paper for description of the 4 different models.

The help information is readily accessible under MatLab environment by
typing in the help command. For example, "help rec_curve", you will get
the explanation of this function and what the inputs and outputs are.
This may help a lot when you forget the input arguments of a function.

Each of the four functions is explained in their help headers as:
1)
%The function [x_sort,area_over,area_under]=CDF(x) is used to
%estimate the cumulative distribution function of the random
%variable x and compute the areas under and over the CDF curve.
%Inputs: x -- a vector of real numbers as a sample of random
%        variable x.
%Outputs: x_sort -- a matrix of two columns, and the first column
%        is orginal x sorted in ascending order and the second
%        column is the probability of x.
%        area_over -- the area over the CDF curve, a real number.
%        area_under -- the area under the CDF curve.

2)
%The function AOC = rec_curve(error_metric,y,yhat,lineSpec) is used
%to draw an REC curve based on the residual y-yhat information and
%return the area over the REC curve. Note this REC plot is scaled
%by the mean model, i.e., the mean of the actual response.
%Inputs: error_metric -- the type of the error metric, if it is
%        'AD', the REC curve is based on absolute deviation; if
%        it is 'SE', the REC curve is based on squared residual.
%        y -- the actual values of response.
%        yhat -- the predicted values of response.
%        lineSpec -- the line specification of the REC curve, for
%        example, if it is 'r-', the REC curve will be a red color
%        solid line, please see MatLab line specification syntax
%        for detail.
%Outputs: AOC -- the area over the REC curve.

3)
%The function [x,yn,y]=syndata(noise_type,A,B,C,n) is used to
%generate synthetic data with additive noise. The independent x
%is randomly generated in a 20-dimensional space from a uniform
%distribution on [A,B]. The dependent variable y is generated
%using the function y = sum_i C*x_i + r where C is a constant, i
%runs from 1 to 10, and hence the last 10 independent variables
%are noise. The variable r is the additive noise which can follow
%the Gaussian, uniform, Laplacian, Gamma or Weibull distribution
%depending on the choice of the 'noise_type'.
%Inputs: noise_type -- the type of distributions that the additive
%         noise follows. If it is 'Gaussian', then Gaussian random
%         variable r will be generated. Similarly, other choices
%         include 'uniform', 'Laplacian', 'Gamma' and 'Weibull'.
%        A -- the left end of the interval for the uniform dist of x.
%        B -- the right end of the interval for the uniform dist of x.
%        C -- the constant coefficient used in the model to generate y.
%        n -- the number of the synthetic data examples.
%Outputs: x -- the n sample of the 20 independent variables.
%         yn -- the response y generated using the above function on x.
%         y -- the raw response y generated using y = sum_i C*x_i

4)
%The function sample_curves is used to generate sample REC curves
%based on the absolute deviation and squared error on synthetic
%data with Gaussian noise and Laplacian noise.
%This function will generate 3 figures: the first one shows info
%about the response in Gaussian noise data; the second one shows
%info about the response in Laplacian noise data; the third one
%shows 4 REC curves based on the absolute deviation and squared
%error for Gaussain noise data (above two) and Laplacian noise
%data (below two).

Consult our paper "Regression Error Characteristic Curves" for more
complete description of the REC curve plot and our expeirments on
synthetic data.  This package provides a preliminary result concerning
REC curve analysis. Contact either Dr. Kristin Bennett (bennek@rpi.edu)
or Jinbo Bi (bij2@rpi.edu) for on-going progress.```