weka.classifiers.mi
Class MINND

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.mi.MINND
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, MultiInstanceCapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler

public class MINND
extends Classifier
implements OptionHandler, MultiInstanceCapabilitiesHandler, TechnicalInformationHandler

Multiple-Instance Nearest Neighbour with Distribution learner.

It uses gradient descent to find the weight for each dimension of each exeamplar from the starting point of 1.0. In order to avoid overfitting, it uses mean-square function (i.e. the Euclidean distance) to search for the weights.
It then uses the weights to cleanse the training data. After that it searches for the weights again from the starting points of the weights searched before.
Finally it uses the most updated weights to cleanse the test exemplar and then finds the nearest neighbour of the test exemplar using partly-weighted Kullback distance. But the variances in the Kullback distance are the ones before cleansing.

For more information see:

Xin Xu (2001). A nearest distribution approach to multiple-instance learning. Hamilton, NZ.

BibTeX:

 @misc{Xu2001,
    address = {Hamilton, NZ},
    author = {Xin Xu},
    note = {0657.591B},
    school = {University of Waikato},
    title = {A nearest distribution approach to multiple-instance learning},
    year = {2001}
 }
 

Valid options are:

 -K <number of neighbours>
  Set number of nearest neighbour for prediction
  (default 1)
 -S <number of neighbours>
  Set number of nearest neighbour for cleansing the training data
  (default 1)
 -E <number of neighbours>
  Set number of nearest neighbour for cleansing the testing data
  (default 1)

Version:
$Revision: 5527 $
Author:
Xin Xu (xx5@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
MINND()
           
 
Method Summary
 void buildClassifier(Instances exs)
          As normal Nearest Neighbour algorithm does, it's lazy and simply records the exemplar information (i.e.
 double classifyInstance(Instance ex)
          Use Kullback Leibler distance to find the nearest neighbours of the given exemplar.
 Instance cleanse(Instance before)
          Cleanse the given exemplar according to the valid and noise data statistics
 void findWeights(int row, double[][] mean)
          Use gradient descent to distort the MU parameter for the exemplar.
 Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 Capabilities getMultiInstanceCapabilities()
          Returns the capabilities of this multi-instance classifier for the relational data.
 int getNumNeighbours()
          Returns the number of nearest neighbours to estimate the class prediction of tests bags
 int getNumTestingNoises()
          Returns The number of nearest neighbour instances in the selection of noises in the test data
 int getNumTrainingNoises()
          Returns the number of nearest neighbour instances in the selection of noises in the training data
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 java.lang.String getRevision()
          Returns the revision string.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String globalInfo()
          Returns a string describing this filter
 double kullback(double[] mu1, double[] mu2, double[] var1, double[] var2, int pos)
          This function calculates the Kullback Leibler distance between two normal distributions.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] args)
          Main method for testing.
 java.lang.String numNeighboursTipText()
          Returns the tip text for this property
 java.lang.String numTestingNoisesTipText()
          Returns the tip text for this property
 java.lang.String numTrainingNoisesTipText()
          Returns the tip text for this property
 Instance preprocess(Instances data, int pos)
          Pre-process the given exemplar according to the other exemplars in the given exemplars.
 void setNumNeighbours(int numNeighbour)
          Sets the number of nearest neighbours to estimate the class prediction of tests bags
 void setNumTestingNoises(int numTesting)
          Sets The number of nearest neighbour exemplars in the selection of noises in the test data
 void setNumTrainingNoises(int numTraining)
          Sets the number of nearest neighbour instances in the selection of noises in the training data
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 double target(double[] x, double[][] X, int rowpos, double[] Y)
          Compute the target function to minimize in gradient descent The formula is:
1/2*sum[i=1..p](f(X, Xi)-var(Y, Yi))^2

where p is the number of exemplars and Y is the class label.

 
Methods inherited from class weka.classifiers.Classifier
debugTipText, distributionForInstance, forName, getDebug, makeCopies, makeCopy, setDebug
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MINND

public MINND()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this filter

Returns:
a description of the filter suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Classifier
Returns:
the capabilities of this classifier
See Also:
Capabilities

getMultiInstanceCapabilities

public Capabilities getMultiInstanceCapabilities()
Returns the capabilities of this multi-instance classifier for the relational data.

Specified by:
getMultiInstanceCapabilities in interface MultiInstanceCapabilitiesHandler
Returns:
the capabilities of this object
See Also:
Capabilities

buildClassifier

public void buildClassifier(Instances exs)
                     throws java.lang.Exception
As normal Nearest Neighbour algorithm does, it's lazy and simply records the exemplar information (i.e. mean and variance for each dimension of each exemplar and their classes) when building the model. There is actually no need to store the exemplars themselves.

Specified by:
buildClassifier in class Classifier
Parameters:
exs - the training exemplars
Throws:
java.lang.Exception - if the model cannot be built properly

preprocess

public Instance preprocess(Instances data,
                           int pos)
                    throws java.lang.Exception
Pre-process the given exemplar according to the other exemplars in the given exemplars. It also updates noise data statistics.

Parameters:
data - the whole exemplars
pos - the position of given exemplar in data
Returns:
the processed exemplar
Throws:
java.lang.Exception - if the returned exemplar is wrong

findWeights

public void findWeights(int row,
                        double[][] mean)
Use gradient descent to distort the MU parameter for the exemplar. The exemplar can be in the specified row in the given matrix, which has numExemplar rows and numDimension columns; or not in the matrix.

Parameters:
row - the given row index
mean -

target

public double target(double[] x,
                     double[][] X,
                     int rowpos,
                     double[] Y)
Compute the target function to minimize in gradient descent The formula is:
1/2*sum[i=1..p](f(X, Xi)-var(Y, Yi))^2

where p is the number of exemplars and Y is the class label. In the case of X=MU, f() is the Euclidean distance between two exemplars together with the related weights and var() is sqrt(numDimension)*(Y-Yi) where Y-Yi is either 0 (when Y==Yi) or 1 (Y!=Yi)

Parameters:
x - the weights of the exemplar in question
rowpos - row index of x in X
Y - the observed class label
Returns:
the result of the target function

classifyInstance

public double classifyInstance(Instance ex)
                        throws java.lang.Exception
Use Kullback Leibler distance to find the nearest neighbours of the given exemplar. It also uses K-Nearest Neighbour algorithm to classify the test exemplar

Overrides:
classifyInstance in class Classifier
Parameters:
ex - the given test exemplar
Returns:
the classification
Throws:
java.lang.Exception - if the exemplar could not be classified successfully

cleanse

public Instance cleanse(Instance before)
                 throws java.lang.Exception
Cleanse the given exemplar according to the valid and noise data statistics

Parameters:
before - the given exemplar
Returns:
the processed exemplar
Throws:
java.lang.Exception - if the returned exemplar is wrong

kullback

public double kullback(double[] mu1,
                       double[] mu2,
                       double[] var1,
                       double[] var2,
                       int pos)
This function calculates the Kullback Leibler distance between two normal distributions. This distance is always positive. Kullback Leibler distance = integral{f(X)ln(f(X)/g(X))} Note that X is a vector. Since we assume dimensions are independent f(X)(g(X) the same) is actually the product of normal density functions of each dimensions. Also note that it should be log2 instead of (ln) in the formula, but we use (ln) simply for computational convenience. The result is as follows, suppose there are P dimensions, and f(X) is the first distribution and g(X) is the second: Kullback = sum[1..P](ln(SIGMA2/SIGMA1)) + sum[1..P](SIGMA1^2 / (2*(SIGMA2^2))) + sum[1..P]((MU1-MU2)^2 / (2*(SIGMA2^2))) - P/2

Parameters:
mu1 - mu of the first normal distribution
mu2 - mu of the second normal distribution
var1 - variance(SIGMA^2) of the first normal distribution
var2 - variance(SIGMA^2) of the second normal distribution
Returns:
the Kullback distance of two distributions

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class Classifier
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -K <number of neighbours>
  Set number of nearest neighbour for prediction
  (default 1)
 -S <number of neighbours>
  Set number of nearest neighbour for cleansing the training data
  (default 1)
 -E <number of neighbours>
  Set number of nearest neighbour for cleansing the testing data
  (default 1)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class Classifier
Returns:
an array of strings suitable for passing to setOptions

numNeighboursTipText

public java.lang.String numNeighboursTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumNeighbours

public void setNumNeighbours(int numNeighbour)
Sets the number of nearest neighbours to estimate the class prediction of tests bags

Parameters:
numNeighbour - the number of citers

getNumNeighbours

public int getNumNeighbours()
Returns the number of nearest neighbours to estimate the class prediction of tests bags

Returns:
the number of neighbours

numTrainingNoisesTipText

public java.lang.String numTrainingNoisesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumTrainingNoises

public void setNumTrainingNoises(int numTraining)
Sets the number of nearest neighbour instances in the selection of noises in the training data

Parameters:
numTraining - the number of noises in training data

getNumTrainingNoises

public int getNumTrainingNoises()
Returns the number of nearest neighbour instances in the selection of noises in the training data

Returns:
the number of noises in training data

numTestingNoisesTipText

public java.lang.String numTestingNoisesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumTestingNoises

public int getNumTestingNoises()
Returns The number of nearest neighbour instances in the selection of noises in the test data

Returns:
the number of noises in test data

setNumTestingNoises

public void setNumTestingNoises(int numTesting)
Sets The number of nearest neighbour exemplars in the selection of noises in the test data

Parameters:
numTesting - the number of noises in test data

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Classifier
Returns:
the revision

main

public static void main(java.lang.String[] args)
Main method for testing.

Parameters:
args - the options for the classifier