-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathen_ub3UlaYRt9I.txt
79 lines (79 loc) · 5.17 KB
/
en_ub3UlaYRt9I.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Hello, and welcome!
In this video, we’ll be covering non-linear regression basics.
So let’s get started!
These data points correspond to China's Gross Domestic Product (or GDP) from 1960 to 2014.
The first column, is the years, and the second, is China's corresponding annual gross domestic
income in US dollars for that year.
This is what the data points look like.
Now, we have a couple of interesting questions.
First, “Can GDP be predicted based on time?”
And second, “Can we use a simple linear regression to model it?”
Indeed, if the data shows a curvy trend, then linear regression will not produce very accurate
results when compared to a non-linear regression -- simply because, as the name implies, linear
regression presumes that the data is linear.
The scatterplot shows that there seems to be a strong relationship between GDP and time,
but the relationship is not linear.
As you can see, the growth starts off slowly, then from 2005 onward, the growth is very
significant.
And finally, it decelerates slightly in the 2010s.
It kind of looks like either a logistical or exponential function.
So, it requires a special estimation method of the non-linear regression procedure.
For example, if we assume that the model for these data points are exponential functions,
such as y ̂ = θ_0 + θ_1 〖θ_2〗^x, our job is to estimate the parameters of the model,
i.e. θs, and use the fitted model to predict GDP for unknown or future cases.
In fact, many different regressions exist that can be used to fit whatever the dataset
looks like.
You can see a quadratic and cubic regression lines here, and it can go on and on to infinite
degrees.
In essence, we can call all of these "polynomial regression," where the relationship between
the independent variable x and the dependent variable y is modelled as an nth degree polynomial
in x.
With many types of regression to choose from, there’s a good chance that one will fit
your dataset well.
Remember, it’s important to pick a regression that fits the data the best.
So, what is polynomial Regression?
Polynomial regression fits a curved line to your data.
A simple example of polynomial, with degree 3, is shown as y ̂ = θ_0 + θ_1x + θ_2x^2
+ θ_3x^3 or to the power of 3, where θs are parameters to be estimated that makes the model fit perfectly
to the underlying data.
Though the relationship between x and y is non-linear here, and polynomial regression
can fit them, a polynomial regression model can still be expressed as linear regression.
I know it's a bit confusing, but let’s look at an example.
Given the 3rd degree polynomial equation, by defining x_1 = x and x_2 = x^2 or x to the power of 2 and so on,
the model is converted to a simple linear regression with new variables, as y ̂ = θ_0+
θ_1x_1 + θ_2x_2 + θ_3x_3. This model is linear in the parameters to
be estimated, right?
Therefore, this polynomial regression is considered to be a special case of traditional multiple
linear regression.
So, you can use the same mechanism as linear regression to solve such a problem.
Therefore, polynomial regression models CAN fit using the model of least squares.
Least squares is a method for estimating the unknown parameters in a linear regression
model, by minimizing the sum of the squares of the differences between the observed dependent
variable in the given dataset and those predicted by the linear function.
So, what is “non-linear regression” exactly?
First, non-linear regression is a method to model a non-linear relationship between the
dependent variable and a set of independent variables.
Second, for a model to be considered non-linear, y ̂ must be a non-linear function of the
parameters θ, not necessarily the features x.
When it comes to non-linear equation, it can be the shape of exponential, logarithmic,
and logistic, or many other types.
As you can see, in all of these equations, the change of y ̂ depends on changes in the
parameters θ, not necessarily on x only.
That is, in non-linear regression, a model is non-linear by parameters.
In contrast to linear regression, we cannot use the ordinary "least squares" method to fit
the data in non-linear regression, and in general, estimation of the parameters is not easy.
Let me answer two important questions here: First, “How can I know if a problem is linear
or non-linear in an easy way?”
To answer this question, we have to do two things:
The first is to visually figure out if the relation is linear or non-linear.
It’s best to plot bivariate plots of output variables with each input variable.
Also, you can calculate the correlation coefficient between independent and dependent variables,
and if for all variables it is 0.7 or higher there is a linear tendency, and, thus, it’s
not appropriate to fit a non-linear regression.
The second thing we have to do is to use non-linear regression instead of linear regression when
we cannot accurately model the relationship with linear parameters.
The second important questions is, “How should I model my data, if it displays non-linear
on a scatter plot?”
Well, to address this, you have to use either a polynomial regression, use a non-linear
regression model, or "transform" your data, which is not in scope for this course.
Thanks for watching.