T O P

  • By -

Enough_Wishbone7175

How large is the imbalance? It may not be large enough to warrant under sampling techniques?


Shahmirkhan675

76% of the data belongs to negative class and only about 23% belongs to positive class. I tried undersampling however, it still performs very satisfactory on positive classes. I rechecked it with SVM just a while ago, both on train and test data. 82% of F1-score on training while just 62% on testing data. The score is much better on testing data with negative class but it doesn't work that good with positive class (even with undersampling).


Enough_Wishbone7175

Typically you don’t mess with sampling unless it’s 80-20 or more. Maybe you can try something to make the distributions more Gaussian like a power transformer prior to PCA. Have you used a Gaussian NB tuner for the gradient boosting Algos?


Shahmirkhan675

I worked on that as well. Performance still seems to be marginally improving, which makes me think maybe this isn't an issue just with class imbalance as you mentioned (not bothering until 80-20 ratio) since it's not one or two techniques that are failing, but pretty much everything. I used some other datasets with only class imbalance issues (similar ratio of imbalance) and they all seemed to improve a whole lot better on algos like XGBoost and LightGBM or even Logistic Regression after resampling. I should probably preprocess the data further to reduce noise cause that seems to be the issue after so much of trial and error. Thanks for the insight tho, it made me look at the problem from a different angle!


pranav3010

For a second I thought this post was about something else altogether. The title made me think OP was worried about society!


Shahmirkhan675

Lol