Helping reduce overfitting in a Wine Quality classification model
00:14 10 Jun 2026

I am currently working on the Wine Quality dataset and struggling with significant overfitting. My model performs well on the training set but fails to generalize on the test set.

I have already tried:

  1. Various algorithms (Random Forest, SVM, Logistic Regression).

  2. Tuning hyperparameters (GridSearch, adjusting max_depth, min_samples_leaf).

  3. Feature engineering (creating new ratios).

  4. Handling class imbalance (class_weight='balanced').

Despite these efforts, my training accuracy remains around 0.8-0.9 while my test accuracy stays stuck at 0.65.

Could anyone provide insights on whether this is due to inherent noise in the human-labeled data, or if I should change my strategy (e.g., switching to regression or advanced techniques like XGBoost)? Any advice or alternative approaches would be greatly appreciated.

Thank you!

https://colab.research.google.com/drive/1R0jhClimKn1EsfFRAV-612IwmTeypQK3?usp=sharing

algorithm machine-learning random-forest overfitting-underfitting