@miotot wrote:
Hi Gurus,
As I am new to this tool and data analytics in general, just want to understand how the factors work in R (or probably in more general sense).
For example, I have this set of data"indicator" "countrycode" "distance" 0 US 0.1 0 US 0.18 0 US 0.21 0 US 0.19 1 US 0.2 1 US 0.21 0 GB 0.24 0 GB 0.23 0 GB 0.21 0 GB 0.22 1 GB 0.2 1 FR 0.1
and want to perform logistics regression model to predict the indicator.
myFullLRModel = glm(indicator ~ countrycode + distance, data=myraw.data, family=binomial)So I got the result (as below) which it excludes the value for FR which I have expected.
Call: glm(formula = indicator ~ countrycode + distance, family = binomial, data = myraw.data) Deviance Residuals: Min 1Q Median 3Q Max -1.0474 -0.7889 -0.6405 0.3285 1.9414 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 15.97 3956.18 0.004 0.997 countrycodeGB -20.88 3956.18 -0.005 0.996 countrycodeUS -19.63 3956.18 -0.005 0.996 distance 15.92 28.81 0.552 0.581 (Dispersion parameter for binomial family taken to be 1) Null deviance: 15.276 on 11 degrees of freedom Residual deviance: 12.256 on 8 degrees of freedom AIC: 20.256 Number of Fisher Scoring iterations: 16
Now my question is, when I interchanged the values of the country from GB to FR and/or FR to GB, I expected that it will exclude the coefficients for GB, and it will show coefficients for country code “FR”. But results shows differently (as per below)
"indicator" "countrycode" "distance" 0 US 0.1 0 US 0.18 0 US 0.21 0 US 0.19 1 US 0.2 1 US 0.21 0 FR 0.24 0 FR 0.23 0 FR 0.21 0 FR 0.22 1 FR 0.2 1 GB 0.1 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.903 6.490 -0.755 0.450 countrycodeGB 20.877 3956.182 0.005 0.996 countrycodeUS 1.247 1.689 0.738 0.460 distance 15.916 28.809 0.552 0.581
Am I doing something wrong or if this is the expected result, would you be able to explain why is it so? This is just for my further understanding and reference. Thanks.
Posts: 2
Participants: 2