ValueError: Expected 2D array, got 1D array instead:

Solution 1:

You need to give both the fit and predict methods 2D arrays. Your x_train, y_train and x_test are currently only 1D. What is suggested by the console should work:

x_train= x_train.reshape(-1, 1)
y_train= y_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)

This uses numpy's reshape. Questions about reshape have been answered in the past, this for example should answer what reshape(-1,1) means: What does -1 mean in numpy reshape?

Solution 2:

If you look at documentation of LinearRegression of scikit-learn.

fit(X, y, sample_weight=None)

X : numpy array or sparse matrix of shape [n_samples,n_features]

predict(X)

X : {array-like, sparse matrix}, shape = (n_samples, n_features)

As you can see X has 2 dimensions, where as, your x_train and x_test clearly have one. As suggested, add:

x_train = x_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)

Before fitting and predicting the model.

Solution 3:

Use

y_pred = regressor.predict([[x_test]])

Solution 4:

A lot of times when doing linear regression problems, people like to envision this graph

one variable input linear regression

On the input, we have an X of X = [1,2,3,4,5]

However, many regression problems have multidimensional inputs. Consider the prediction of housing prices. It's not one attribute that determines housing prices. It's multiple features (ex: number of rooms, location, etc. )

If you look at the documentation you will see this screenshot from documentation

It tells us that rows consist of the samples while the columns consist of the features.

Description of Input

However, consider what happens when he have one feature as our input. Then we need an n x 1 dimensional input where n is the number of samples and the 1 column represents our only feature.

Why does the array.reshape(-1, 1) suggestion work? -1 means choose a number of rows that works based on the number of columns provided. See the image for how it changes in the input. Transformation using array.reshape

Solution 5:

I would suggest to reshape X at the beginning before you do the split into train and test dataset:

import pandas as pd
import matplotlib as pt

#import data set

dataset = pd.read_csv('Sample-data-sets-for-linear-regression1.csv')
x = dataset.iloc[:, 1].values
y = dataset.iloc[:, 2].values
# Here is the trick
x = x.reshape(-1,1)

#Spliting the dataset into Training set and Test Set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=0)

#linnear Regression

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(x_train,y_train)

y_pred = regressor.predict(x_test)