I have used sci-kit make_classification method to create various levels of imbalance datasets. Then I am applying resampling techniques to see their effectiveness. Based on the research I did, precision always improves at the cost of recall when undersampling is applied, but it did not occur in my case. The performance of undersampling is very similar to no resampling. I want to know if I have made some errors in code or the reason for the performance of undersampling.
If you are running the kaggle notebook, you need to uncomment the required lines from box [2177]. First two commented are for no sampling, next 3 are for SMOTE, next 3 are for Tomek links.
Any help is appreciated!
1 post - 1 participant