dtype='numeric' is not compatible with arrays of bytes/strings. Convert your data to numeric values explicitly instead

04:59 11 Apr 2023

I was doing Linear Regression using scikit learn and I've tried in various methods by reshaping them either which resulted in entire error in the code. The dataset is

 R&D Spend  Administration  Marketing Spend       State     Profit
0   165349.20       136897.80        471784.10    New York  192261.83
1   162597.70       151377.59        443898.53  California  191792.06
2   153441.51       101145.55        407934.54     Florida  191050.39
3   144372.41       118671.85        383199.62    New York  182901.99
4   142107.34        91391.77        366168.42     Florida  166187.94
5   131876.90        99814.71        362861.36    New York  156991.12
6   134615.46       147198.87        127716.82  California  156122.51
7   130298.13       145530.06        323876.68     Florida  155752.60
8   120542.52       148718.95        311613.29    New York  152211.77
9   123334.88       108679.17        304981.62  California  149759.96
10  101913.08       110594.11        229160.95     Florida  146121.95
11  100671.96        91790.61        249744.55  California  144259.40
12   93863.75       127320.38        249839.44     Florida  141585.52
13   91992.39       135495.07        252664.93  California  134307.35
14  119943.24       156547.42        256512.92     Florida  132602.65

I've tried the following code

#Dataset
dataset=pd.read_csv(r'50_Startups.csv')
X=dataset.iloc[:,:-1]
y=dataset.iloc[:,-1]
#Encoding Categorical Data
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
oHe=OneHotEncoder()
ct=ColumnTransformer(transformers=[('encoder',oHe,[3])],remainder='passthrough')
X = np.array(ct.fit_transform(X), dtype = np.str)
#Splitting into Training and Test sets 
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1)
#Training the Multiple Linear Regression
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(X_train,y_train)

the error is:

ValueError: dtype='numeric' is not compatible with arrays of bytes/strings.
Convert your data to numeric values explicitly instead.

python python-3.x scikit-learn linear-regression

Your Answer

Privacy & Cookie Consent