This is python implementation for this paper: https://dl.acm.org/citation.cfm?id=2976304 \
- test_synth.py - the test that runs the synthetic data experiment
- kqbc.py - the python functions converted from matlab code
- matlab/KQBC.m - matlab original code from the paper
- matlab/hit_n_run.m - matlab original code from the paper
The synthtetic data test executes the KQBC algorithm for learning a linear classifier in a d-dimensional space. The target classifier is the vector w∗ = (1, 0, . . . , 0) thus the label of an instance x ∈ IRd is the sign of its first coordinate. The instances were normally distributed N (µ = 0, Σ = Id).
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. For out experiment I degenerated the dataset into 2 classes (1,-1), which are one of the figures and all the rest, and balanced between them by eliminating most of the other classes samples. The final data was a sub version of the original one, consinting of 1000 training examples for each class, so 2000 in total, and 200 for each test class in the same manner.
- steps - defines the number of repetitions of the experiment for statistical significance.
- dim - the dimention of the samplse and classifiers
- samples - a list of number of samples to train on
- hnr_ - number of hit and run walks inside the polytope
Command line example:
python test_synth.py --steps 20 --dim 5 --hnr_iter 21 --plot
Python
matlab
numpy
matplotlib
These plots describe the error rate on the test set vs the number of samples the classifier was trained on.
- SVM - random k entries
- KQBC - using the classifier (point in the version space) after T steps (using dot product)
It is clear that we get exponential improvenet when the classifier is extracted from KQBC algorithm.