An intuitive and fast nonparametric algorithm to estimate class proportions
An updated version of the method can be found here
Clone repository
git clone [email protected]:Dzeiberg/ClassPriorEstimation.git
Install dependencies
cd ClassPriorEstimation
pip install -r requirements.txt
An example call to generate training data
mkdir rawTrainData
python dataProcessing/generateTrainSamples.py --trainSetSize 100000 --saveDirectory rawTrainData
Given a directory of raw datasets with each dataset represented by a JSON file in the following format:
{
"sample": [x_1, ..., x_N],
"component_assignment": [s_1, ..., s_N],
"class_prior": alpha
}
where xi represents a sample and si represents the positive v. unlabeled assignment,
the directory of datasets can be processed by calling:
python dataProcessing/processDataset.py --sample_directory data_directory --distance_metric euclidian --number_curves_to_average 10
The set of supported distance metrics to be used when constructing the distance curve is: {euclidian, city block, yang(p=1), yang(p=2)}
To estimate the class prior from a given dataset run
python estimate.py --model_path model.hdf5 --features_path features.npy --labels_path labels.npy --out_file out.txt
See also the list of contributors who participated in this project.