Skip to content

Dzeiberg/ClassPriorEstimation

Repository files navigation

Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting

An intuitive and fast nonparametric algorithm to estimate class proportions

Updated Paper

Updated Repository

An updated version of the method can be found here

Installing

Clone repository

git clone [email protected]:Dzeiberg/ClassPriorEstimation.git

Install dependencies

cd ClassPriorEstimation
pip install -r requirements.txt

Download Model

Generate Training Data

An example call to generate training data

mkdir rawTrainData
python dataProcessing/generateTrainSamples.py --trainSetSize 100000 --saveDirectory rawTrainData

Pre-Process Data

Given a directory of raw datasets with each dataset represented by a JSON file in the following format:

{
	"sample": [x_1, ..., x_N],
	"component_assignment": [s_1, ..., s_N],
	"class_prior": alpha
}

where xi represents a sample and si represents the positive v. unlabeled assignment,

the directory of datasets can be processed by calling:

python dataProcessing/processDataset.py --sample_directory data_directory --distance_metric euclidian --number_curves_to_average 10

The set of supported distance metrics to be used when constructing the distance curve is: {euclidian, city block, yang(p=1), yang(p=2)}

Estimate Class Prior

To estimate the class prior from a given dataset run

python estimate.py --model_path model.hdf5 --features_path features.npy --labels_path labels.npy --out_file out.txt

Authors

See also the list of contributors who participated in this project.

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages