@somanadha_sastry wrote:
Hello community,
In order to avoid overfit/underfit of data one of the common mechanism we will do is to divide data j to train and test samples. But my question is at what stage should we do it? Is it before EDA or after EDA? The reason why i am checking this is because there is missing value treatment and outliers treatment in EDA for which we will look at the whole data, So if we do the split after EDA, model might have already seen the whole data compromising the basic concept of splitting. But there are many places i have seen the splitting is done after EDA. So please help me clear my ambiguity
Posts: 1
Participants: 1