Helping reduce overfitting in a Wine Quality classification model

00:14 10 Jun 2026

I am currently working on the Wine Quality dataset and struggling with significant overfitting. My model performs well on the training set but fails to generalize on the test set.

I have already tried:

Various algorithms (Random Forest, SVM, Logistic Regression).
Tuning hyperparameters (GridSearch, adjusting max_depth, min_samples_leaf).
Feature engineering (creating new ratios).
Handling class imbalance (class_weight='balanced').

Despite these efforts, my training accuracy remains around 0.8-0.9 while my test accuracy stays stuck at 0.65.

Could anyone provide insights on whether this is due to inherent noise in the human-labeled data, or if I should change my strategy (e.g., switching to regression or advanced techniques like XGBoost)? Any advice or alternative approaches would be greatly appreciated.

Thank you!

https://colab.research.google.com/drive/1R0jhClimKn1EsfFRAV-612IwmTeypQK3?usp=sharing

algorithm machine-learning random-forest overfitting-underfitting

Your Answer

Privacy & Cookie Consent