Error using sklearn and linear regression: shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)

I wanted to learn about machine learning and I stumbled upon youtube siraj and his Udacity videos and wanted to try and pick up a few things.

His video in reference: https://www.youtube.com/watch?v=vOppzHpvTiQ&index=1&list=PL2-dafEMk2A7YdKv4XfKpfbTH5z6rEEj3

In his video, he had a txt file he imported and read, but when I tried to recreate the the txt file it couldn't be read correctly. Instead, I tried to create a pandas dataframe with the same data and perform the linear regression/predict on it, but then I got the below error.

Found input variables with inconsistent numbers of samples: [1, 16] and something about passing 1d arrays and I need to reshape them.

Then when I tried to reshape them following this post: Sklearn : ValueError: Found input variables with inconsistent numbers of samples: [1, 6]

I get this error....

shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)

This is my code down below. I know it's probably a syntax error, I'm just not familiar with this scklearn yet and would like some help.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model

#DF = pd.read_fwf('BrainBodyWeight.txt')
DF = pd.DataFrame()
DF['Brain'] = [3.385, .480, 1.350, 465.00,36.330, 27.660, 14.830, 1.040, 4.190, 0.425, 0.101, 0.920, 1.000, 0.005, 0.060, 3.500 ]

DF['Body'] = [44.500, 15.5, 8.1, 423, 119.5, 115, 98.2, 5.5,58, 6.40, 4, 5.7,6.6, .140,1, 10.8]

try:
    x = DF['Brain']
    y = DF['Body']

    x = x.tolist()
    y = y.tolist()

    x = np.asarray(x)
    y = np.asarray(y)


    body_reg = linear_model.LinearRegression()
    body_reg.fit(x.reshape(-1,1),y.reshape(-1,1))
    plt.scatter(x,y)
    plt.plot(x,body_reg.predict(x))
    plt.show()
except Exception as e:
    print(e)

Can anyone explain why sklearn doesn't like my input????


From documentation LinearRegression.fit() requires an x array with [n_samples,n_features] shape. So that's why you are reshaping your x array before calling fit. Since if you don't you'll have an array with (16,) shape, which does not meet the required [n_samples,n_features] shape, there are no n_features given.

x = DF['Brain']
x = x.tolist()
x = np.asarray(x)

# 16 samples, None feature
x.shape
(16,)

# 16 samples, 1 feature
x.reshape(-1,1).shape
(16,1)

The same requirement goes for the LinearRegression.predict function (and also for consistency), you just simply need to do the same reshaping when calling the predict function.

plt.plot(x,body_reg.predict(x.reshape(-1,1)))

Or alternatively you can just reshape the x array before calling any functions.

And for feature reference, you can easily get the inner numpy array of values by just calling DF['Brain'].values. You don't need to cast it to list -> numpy array. So you can just use this instead of all the conversion:

x = DF['Brain'].values.reshape(1,-1)
y = DF['Body'].values.reshape(1,-1)

body_reg = linear_model.LinearRegression()
body_reg.fit(x, y)