Linear Regression for Machine Learning

Apiumhub - Dec 9 '22 - - Dev Community

Introduction

In this article, we will look at the Linear Regression model for Machine Learning, which is one of the most basic models available.

Linear Regression

Linear regression for machine learning

This equation shows a multi-dimension formula for linear regression, where ŷ is the predicted multidimensional value, n is the number of dimensions (or commonly called features),_x_i is the _i_th feature value, and ⍬ is the _j_th model parameter or weight.

With this definition, we now must see the way to train a model following this equation. Training a model means setting its parameters so the model best fits the training set, also we must find a measure of how well the model fits the training data, for this purpose we could use the Mean Square Error.

Equation - linear regression for machine learning

Here we are using x, y, ⍬ as vectors of size m.

Taking this into account, to train a Linear Regression model we need to find the value of ⍬ that minimizes the MSE.

Normal Equation

There is a closed-form solution called Normal Equation that gives the value of ⍬ that minimizes the cost function.

Normal equation - linear regression for machine learning

Where ⍬^is the value of ⍬ that minimizes the cost function, y is the vector of target values containing y¹to _y_m

Let’s see an example with linear-looking data to test this equation:

QvDOw7EZfxWkNvDZAHZJ8BDOHFg0CCnWgmqX7L3NfczhEB2hTU4YVA uz7j3fBIMDbyI68ABPsXJ8eCVRvbbiiVjwKYQ8vlAQ6nmdaGXUYxmIVwcZrLMKoKqtjYa9UDU8XVdwFJ4ElZUi2kM0umjKuQoOKgag0wj5hwYsCD Gh wNNnln07QGiVWK2TrZcjczzzRI 4

Now we can compute ⍬^using the Normal Equation:

bdsA6eET6UR54KDpM9fwRELwP0B3a5sCnue4aPMF0kAdt5FtgtYh8hvrKPSBLvTbJjQJLp2AUu 4E8z5sKh8fc7fJMzIcNDio2a55qzoGahKZlpdiRjhtll08v2qSQcx0PdJN2RyC9FRNwl5W4O2q6iXud qmkKNMu GoKmRtm7Kt6vqJ4hlD jPkcpWRgXhehds7Vo

As we can see the initial equation that we used to generate the data is:

Equation - Liner regression

And we could have expected:

Linear regression for machine learning

The result was close enough, nevertheless, the Gaussian noise made it impossible to recover the exact parameters of the original function. Now we can make predictions using ⍬^:

UfOVhiHB24XI8vaeceWWQsWAYT71NvBU6SC9UIpXx65 lJfX4iqocB42K1V61RJ0fVbTrvzQM 0w0q8NVkrZmNivHNSXTIvsAgBX1E1GiSYc62FLYlaOrWInKmHXNu2UNS4FDeCPWpzeEwzqwK7ojTPDgI1dPVoMwIfegXC5ziReembbCpqO7ITFS1H YjLOOJvI7Nc

There are several ML tools to perform Linear Regression rather than compute these equations manually, for example, Scikit-Learn:

kw g6riDkGcihJ55j0cEF2I8tmLsXgtSwhaF

Standard Correlation Coefficient

The standard correlation coefficient, also called Pearson’s, gives us a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations:

Standard correlation coefficient - Linear regression model for machine learning

This coefficient varies between -1 and 1. The close to 1 or -1 the more correlated the two variables (positive or negative).

We can compute this coefficient easily with Python using the pandas library. For our example, we need to transform x and y in one dimension variables with the concatenate function from numpy and then using the corr() function in the dataset created with pandas, we obtain the correlation matrix between these two parameters:

i5wPOGfI2r5x354RKMHns AnpZmRl9xJ1Wpmuimzfv T4irICrYJHX1R1DmAI92PpMggduUE7sFtZy7nR51AQlmR8siWY09K7jC17PqZaFPTMm9ulhxKTpz 7L4heNYEE1 z5uhgUy1 pKlTBINirknkFeLT7KkxQO 91cfomn9hGj05Y36SzSOXhS5CQHj0V0oFCAc

In this case, the correlation coefficient between x and y is 0.855022. So we can say that they are quite correlated as expected.

Using Linear Regression

In this section, we are going to use what we have learned previously to analyze a dataset of Boston House Prices. We can download the dataset from Kaggle (it provides us with multiple datasets for testing and learning.)

Attribute Information

Input features in order:

  1. CRIM: per capita crime rate by town
  2. ZN: proportion of residential land zoned for lots over 25,000 sq. ft.
  3. INDUS: proportion of non-retail business acres per town
  4. CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
  5. NOX: nitric oxides concentration (parts per 10 million) [parts/10M]
  6. RM: average number of rooms per dwelling
  7. AGE: proportion of owner-occupied units built prior to 1940
  8. DIS: weighted distances to five Boston employment centers
  9. RAD: index of accessibility to radial highways
  10. TAX: full-value property-tax rate per $10,000 [$/10k]
  11. PTRATIO: pupil-teacher ratio by town
  12. B: The result of the equation B=1000(Bk – 0.63)^2 where Bk is the proportion of blacks by town
  13. LSTAT: % lower status of the population
  14. MEDV: Median value of owner-occupied homes in $1000’s [k$]

Our aim will be to see the correlation between prices and other variables in the dataset, using the standard correlation coefficient. We first load the dataset using pandas:

bO9v8uGTWD1rLjzYHb fxecpeQsdBqFnfUVRmgMtU Uq ipBtBS EYrdOm3fErRVvZLuyjdSptQMaNVa7SNZ4ygdV7GJ5GU8N ZVJ0mmglV2RVKT7Kc30YrRaYEcJo8rbI sDuRGIw9Bvwdes

Now we can look at how each attribute correlates with the Median value of owner-occupied homes:

QtfqgeH3dgqNWl7oJcpJtcWN0IhF8w7 eXyOuYmaGhJ3sRu5hB fpwPOpZRCrDA99bVV4ZggY tz6w0Cscfday5ywb6Ck9C vxIT7orG7FYKPnNx77b8h7bvbLQBt5y2 37AAqiEdr LgoZl2UMVNO6UlHMPc4 DhpN4YbHNAVSy1MGqNchgWDX3SrNIP1TW3B38K60

The correlation coefficient varies between 1 to -1. When is close to 1, it means that there is a strong positive correlation. In this case, the price tends to go up when RM ( average number of rooms per dwelling) goes up. We can see that the correlation coefficient is 0.695 and as we observe in the chart despite some points there is a strong correlation between these two features.

Conclusion

We have seen the basics of linear regression and its applications to machine learning. We have learned how to predict new values with linear regression models and also we have seen how to use the standard correlation coefficient matrix to obtain correlated features in datasets.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player