@mayanksatnalika wrote:
Say I have a feature city having values {Delhi, Mumbai, Kolkata etc} and a feature population having numerical data. If I want to predict a third feature (say polluiton) using the above city and population by applying multiple regression. Now I can code it such that each city is represented as a no Delhi-->0, Mumbai-->1, Kolkata-->2 and so on. But now if I apply regression, won't it be treated as any numerical value and non categorical.
It does not seem correct as if Kolkata is coded as 2, Mumbai as 1 and Delhi as 0, regression will always assume that the order of impact on answer is Kolkata > Mumbai > Delhi or Kolkata < Mumbai < DelhiWhat is the mathematics behind regression with categorical variables? Do we need to create new features like is_Delhi, is_Mumbai and is_Kolkata with a 0 or 1 value for each training set?
Posts: 1
Participants: 1