In the current post, we will try to understand simple linear regression algorithm
and its algorithm writing from scratch
and same thing we compare that comes from sci-kit learn

And some of the statistical terminologies to understand the model.
For this, we'll use Boston Housing Data set, this is a sample dataset from sklearn

#### Import required Python libraries¶

```
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
from matplotlib import animation, rc
from IPython.display import HTML
```

```
# Load the data set
from sklearn.datasets import load_boston
```

```
boston_data = load_boston()
```

```
type(boston_data)
```

boston_data is a dictionary, like a regular Python dictionary we can access its keys and values

```
boston_data.keys()
```

To check the features

```
boston_data['feature_names']
```

To check the size of data

```
boston_data['data'].shape
```

And there is description about the data

```
print(boston_data['DESCR'])
```

We will create Pandas DataFrame with data from Boson dataset

```
df = pd.DataFrame(data=boston_data['data'])
df.columns = boston_data['feature_names']
```

We' ll add target data to this DataFrame

```
df['Price'] = boston_data['target']
```

```
df.head()
```

We'll first try Simple Linear Regression with single independent variable

If we check the correlation of all other features with Target,

```
corr = df.corr()
```

```
corr['Price'].sort_values(ascending=False)
```

From description RM is number of rooms per dwelling. This feature is more correlated with housing price.

We can also qualitatively see the correlation map

```
import seaborn as sns
f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
square=True, ax=ax)
plt.show()
```

The bottom most row show the correlation the square

```
from pylab import rcParams
rcParams['figure.figsize'] = 20, 15
```

```
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pandas.plotting import scatter_matrix
axes = scatter_matrix(df, alpha=0.5, diagonal='kde')
corr = df.corr().as_matrix()
for i, j in zip(*plt.np.triu_indices_from(axes, k=1)):
axes[i, j].annotate("%.3f" %corr[i,j], (0.8, 0.8), xycoords='axes fraction', ha='center', va='center')
plt.show()
```