@veee wrote:
Hi All,
I have implemented incremental kmeans clustering [which use a cluster radius aka threshold to create clusters] and trying to identify the fraud transaction in a dataset. My correlation results that 5 features out of 9 features have correlation value more than 77% and when i tried the clustering based on those 5 feature the results are not satisfactory. So, I engineered a new feature. When i use the a 1 feature from the dataset and the new engineered feature the results looks much better. Now the algorithm gives me 2 cluster, more than 85% of fraud transaction in first cluster and legit transaction in another cluster. Is this approach is acceptable?
Please suggest me a method to identify the features than can influence my clustering?
Thanks
V
Posts: 1
Participants: 1