
For instance, these 3 students who studied for ~30 hours got very different scores: 74%, 65% and 40%.Īnyway, let’s fit a line to our data set - using linear regression: But you can see the natural variance, too. If one studies more, she’ll get better results on her exam. We have 20 data points (20 students) here.īy looking at the whole data set, you can intuitively tell that there must be a correlation between the two factors.

she studied 24 hours and her test result was 58%: the Y axis shows the scores that she eventually gotĮ.g.the X axis shows how many hours a student studied for the exam.Each student is represented by a blue dot on this scatter plot: We have 20 students in a class and we have data about a specific exam they have taken. If this sounds too theoretical or philosophical, here’s a typical linear regression example! But to do so, you have to ignore natural variance - and thus compromise on the accuracy of your model. You want to simplify reality so you can describe it with a mathematical formula. Machine learning – just like statistics – is all about abstractions. (Tip: try out what happens when a = 0 or b = 0 !) By seeing the changes in the value pairs and on the graph, sooner or later, everything will fall into place. Change the a and b variables above, calculate the new x-y value pairs and draw the new graph. The most intuitive way to understand the linear function formula is to play around with its values. Note: Here’s some advice if you are not 100% sure about the math. This is all you have to know about linear functions for now…īecause linear regression is nothing else but finding the exact linear function equation (that is: finding the a and b values in the y = a*x + b formula) that fits your data points the best. (Or in other words, the value of y is b when x = 0.) b is the value where the plotted line intersects the y-axis. The b variable is called the intercept.The a variable is often called slope because – indeed – it defines the slope of the red line.In the linear function formula: y = a*x + b It also means that x and y will always be in linear relationship. Using the equation of this specific line ( y = 2 * x + 5), if you change x by 1, y will always change by 2.Īnd it doesn’t matter what a and b values you use, your graph will always show the same characteristics: it will always be a straight line, only its position and slope change. The relationship between x and y is linear. If you put all the x– y value pairs on a graph, you’ll get a straight line: Knowing this, you can easily calculate all y values for given x values. In this equation, usually, a and b are given. I have good news: that knowledge will become useful after all!įor linear functions, we have this formula: y = a*x + b Remember when you learned about linear functions in math classes? Linear Regression in Python (using Numpy polyfit)ĭownload it from: here.

Thin line scatter plot matplotlib code#
Thin line scatter plot matplotlib how to#
How to install Python, R, SQL and bash to practice data science!.I highly recommend doing the coding part with me! If you haven’t done so yet, you might want to go through these articles first: We will do that in Python - by using numpy ( polyfit). In this tutorial, I’ll show you everything you’ll need to know about it: the mathematical background, different use-cases and most importantly the implementation. So spend time on 100% understanding it! If you get a grasp on its logic, it will serve you as a great foundation for more complex machine learning concepts in the future. Linear regression is simple and easy to understand even if you are relatively new to data science.

I always say that learning linear regression in Python is the best first step towards machine learning.
