How do I find the 'best' Countries (as feature) in a dataset?

23:17 26 Nov 2025

I have a dataset with a 'Country' column, and for my project I can afford to remove some of them if it means a better RMSE score.

The Countries have different value counts, so there can be 1000 rows for Japan and only 50 rows for Thailand.

I am trying to find the best way to find the Countries that are worth getting rid of that allow a better RMSE score.

machine-learning feature-selection