Reading-Notes

View the Project on GitHub

Linear Regression

Introduction to linear regression

Linear regression is a model that assumes a linear relationship between the input variables (x) and the single output variable (y).

x: independent variable (explanatory variable)

y: dependent variable (response variable)

x variable is used to predict the y variable.

simple regression

Running Linear regression in Python scikit-Learn

You can do linear regression using numpy, scipy, stats model and sckit learn.

Sourse of scikit-learn

Important functions to keep in mind while fitting a linear regression model are:

lm.fit() -> fits a linear model

lm.predict() -> Predict Y using the linear model with estimated coefficients

lm.score() -> Returns the coefficient of determination (R^2). A measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model.

### How to do train-test split: You have to divide your data sets randomly. Scikit learn provides a function called train_test_split to do this.

## Residual plots

this would result as follows:

More

Conclusion

Linear regression is a basic and commonly used type of predictive analysis.

it is used to test the following:

  1. does a set of predictor variables do a good job in predicting an outcome (dependent) variable?
  2. Which variables in particular are significant predictors of the outcome variable?

    a. in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?

y = estimated dependent variable score. c = constant. b = regression coefficient. x = score on the independent variable.

More on types of linear regressions

## Train/Test Split and Cross Validation in Python

## Overfitting/Underfitting Data is usually split into two subsets:

  1. training data
  2. testing data model is fit on the train data, in order to make predictions on the test data. while doing that overfitting/underfitting might happen:

More on test split