This algorithm takes three parameters:
k
: The number of neighbours to considerx_train
: The features of the training datay_train
: The labels of the training data
The predict
method takes the parameter new_data_point
and then calculates the Euclidean distance between that new data point and each of the points in the training data. Then, it sorts these distances in ascending order so that the nearest k
neighbours can be found. The most frequent label of these nearest neighbours is then returned as a prediction.
I've also used the Iris flower data set, which can be used as sample training data.
This creates a least squares regression line. The constructor method takes the parameter data_points
which should be a two-dimensional NumPy array.
There are three other methods are:
equation
which takes no methods and returns the gradient and y-intercept of the equationpredict
which takes the parameterx
and plugs it into the equation before returning the corresponding y valuegraph
which plots a graph with the training data points and the linear regression line
The residual is the vertical distance between each of the data points and the linear regression line. We want to minimise the residual sum of squares (RSS).
We can write RSS as:
Additionally,
We can substitute this into RSS and expand the brackets:
As we want to minimise RSS, we can differentiate it with respect to
Therefore, we can find