dtype='numeric' is not compatible with arrays of bytes/strings. Convert your data to numeric values explicitly instead
I was doing Linear Regression using scikit learn and I've tried in various methods by reshaping them either which resulted in entire error in the code. The dataset is
R&D Spend Administration Marketing Spend State Profit
0 165349.20 136897.80 471784.10 New York 192261.83
1 162597.70 151377.59 443898.53 California 191792.06
2 153441.51 101145.55 407934.54 Florida 191050.39
3 144372.41 118671.85 383199.62 New York 182901.99
4 142107.34 91391.77 366168.42 Florida 166187.94
5 131876.90 99814.71 362861.36 New York 156991.12
6 134615.46 147198.87 127716.82 California 156122.51
7 130298.13 145530.06 323876.68 Florida 155752.60
8 120542.52 148718.95 311613.29 New York 152211.77
9 123334.88 108679.17 304981.62 California 149759.96
10 101913.08 110594.11 229160.95 Florida 146121.95
11 100671.96 91790.61 249744.55 California 144259.40
12 93863.75 127320.38 249839.44 Florida 141585.52
13 91992.39 135495.07 252664.93 California 134307.35
14 119943.24 156547.42 256512.92 Florida 132602.65
I've tried the following code
#Dataset
dataset=pd.read_csv(r'50_Startups.csv')
X=dataset.iloc[:,:-1]
y=dataset.iloc[:,-1]
#Encoding Categorical Data
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
oHe=OneHotEncoder()
ct=ColumnTransformer(transformers=[('encoder',oHe,[3])],remainder='passthrough')
X = np.array(ct.fit_transform(X), dtype = np.str)
#Splitting into Training and Test sets
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1)
#Training the Multiple Linear Regression
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(X_train,y_train)
the error is:
ValueError: dtype='numeric' is not compatible with arrays of bytes/strings.
Convert your data to numeric values explicitly instead.