Goal: Implement a logistic regression on a given dataset (cf subject_dslr.pdf)
If you do not have python3, run:
apt-get install python3
To create a virtual environment, run:
python3 -m venv [your_env_name]
Then: source [your_env_name]/bin/activate
Finally:
pip install -r requirements.txt
cd src
Run python3 describe.py [a_dataset.csv]
to get the description of a dataset.
Use -h
to display the usage and the options
Use -h
to display the usage and the options for the following functions:
-
Run
python3 histogram.py ../datasets/dataset_train.csv
to display the histogram that answers the question:
Which Hogwarts class has an homogenous repartition of grades between the four houses ? -
Run
python3 scatter_plot.py ../datasets/dataset_train.csv
to display a scatter plot that answers the following question:
Which are the 2 similar features ? -
Run
python3 pair_plot.py ../datasets/dataset_train.csv
to display a pair plot that answers the following question:
Which are the features we are going to use in our training ?
Use -h
to display the usage and the options for the following functions:
-
Run
python3 logreg_train.py ../datasets/dataset_train.csv
to train the model. It should creates a file called "weights.pkl" that will be used in the prediction program. -
Run
python3 logreg_predict.py ../datasets/dataset_test.csv weights.pkl
to predict the houses for students of the test dataset. It should create a csv file called "house.csv" where all the predictions are saved.