Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Regression with categorical variables working

$
0
0

@mayanksatnalika wrote:

Say I have a feature city having values {Delhi, Mumbai, Kolkata etc} and a feature population having numerical data. If I want to predict a third feature (say polluiton) using the above city and population by applying multiple regression. Now I can code it such that each city is represented as a no Delhi-->0, Mumbai-->1, Kolkata-->2 and so on. But now if I apply regression, won't it be treated as any numerical value and non categorical.
It does not seem correct as if Kolkata is coded as 2, Mumbai as 1 and Delhi as 0, regression will always assume that the order of impact on answer is Kolkata > Mumbai > Delhi or Kolkata < Mumbai < Delhi

What is the mathematics behind regression with categorical variables? Do we need to create new features like is_Delhi, is_Mumbai and is_Kolkata with a 0 or 1 value for each training set?

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles