@rohit.haritash wrote:
I have 20 predictors consist of categorical and numerical type. Categorical variables have more than 20 levels. When I check for correlation between numeric predictors and target, it shows no or very weak linear relationship.
I run a regression using all variables and I got.
mod1 <- lm(Absenteeism.time.in.hours~.,trainData) Call: lm(formula = Absenteeism.time.in.hours ~ ., data = trainData) Residuals: Min 1Q Median 3Q Max -6.5735 -1.0466 0.0035 0.9400 10.4666 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.01680 1.28457 4.684 3.72e-06 *** Reason.for.absence3 0.60749 2.56512 0.237 0.812897 Reason.for.absence5 -0.96714 1.59901 -0.605 0.545586 Reason.for.absence6 -2.13411 1.23030 -1.735 0.083481 . Reason.for.absence7 -2.35854 1.04951 -2.247 0.025098 * Reason.for.absence8 -2.29507 1.25710 -1.826 0.068551 . Reason.for.absence9 2.50961 1.64525 1.525 0.127861 Reason.for.absence10 -1.38623 0.93321 -1.485 0.138118 Reason.for.absence11 -1.61165 0.97707 -1.649 0.099739 . Reason.for.absence12 -3.61717 1.32533 -2.729 0.006593 ** Reason.for.absence13 -1.97801 0.85425 -2.316 0.021027 * Reason.for.absence14 -3.32432 0.98090 -3.389 0.000762 *** Reason.for.absence15 -0.87024 1.87247 -0.465 0.642326 Reason.for.absence16 -6.86905 2.59794 -2.644 0.008474 ** Reason.for.absence17 -1.53171 2.59252 -0.591 0.554934 Reason.for.absence18 -0.20789 1.00844 -0.206 0.836766 Reason.for.absence19 -0.81656 0.90047 -0.907 0.364978 Reason.for.absence20 -7.62779 1.87942 -4.059 5.81e-05 *** Reason.for.absence21 -1.46102 1.33881 -1.091 0.275722 Reason.for.absence22 -0.32663 0.88390 -0.370 0.711905 Reason.for.absence23 -5.15410 0.79753 -6.463 2.64e-10 *** Reason.for.absence24 -0.54855 2.56299 -0.214 0.830622 Reason.for.absence25 -4.09087 0.89925 -4.549 6.91e-06 *** Reason.for.absence26 -1.47663 0.90940 -1.624 0.105120 Reason.for.absence27 -4.94175 0.88175 -5.604 3.62e-08 *** Reason.for.absence28 -5.69586 0.81330 -7.003 8.98e-12 *** Month.of.absence2 0.88142 0.59180 1.489 0.137071 Month.of.absence3 1.43892 0.61833 2.327 0.020395 * Month.of.absence4 1.05853 0.90034 1.176 0.240324 Month.of.absence5 0.03979 0.91919 0.043 0.965490 Month.of.absence6 0.95497 0.90799 1.052 0.293474 Month.of.absence7 2.45868 1.10621 2.223 0.026730 * Month.of.absence8 1.87651 1.16090 1.616 0.106694 Month.of.absence9 2.17014 1.15453 1.880 0.060788 . Month.of.absence10 1.70371 1.11433 1.529 0.126979 Month.of.absence11 1.06564 1.04628 1.019 0.308974 Month.of.absence12 1.78648 0.96418 1.853 0.064548 . Day.of.the.week3 0.22495 0.34200 0.658 0.511037 Day.of.the.week4 0.08878 0.34473 0.258 0.796875 Day.of.the.week5 0.11849 0.35653 0.332 0.739777 Day.of.the.week6 -0.18830 0.36319 -0.518 0.604380 Seasons2 0.74861 0.84760 0.883 0.377589 Seasons3 1.45558 0.76891 1.893 0.058984 . Seasons4 0.39722 0.70809 0.561 0.575089 Transportation.expense 0.25981 0.14928 1.740 0.082453 . Distance.from.Residence.to.Work -0.12857 0.18140 -0.709 0.478838 Service.time 0.22001 0.20770 1.059 0.290038 Age -0.26143 0.17197 -1.520 0.129144 Work.load.Average.day 0.16362 0.14279 1.146 0.252466 Hit.target -0.01859 0.18278 -0.102 0.919051 Disciplinary.failure1 -0.76155 1.79424 -0.424 0.671444 Education2 -0.14898 0.54931 -0.271 0.786351 Education3 -0.90364 0.48861 -1.849 0.065044 . Education4 -1.65052 1.79884 -0.918 0.359342 Son 0.30184 0.13123 2.300 0.021894 * Social.drinker1 0.18587 0.36431 0.510 0.610168 Social.smoker1 0.41938 0.52089 0.805 0.421173 Pet -0.23407 0.17192 -1.362 0.174015 Weight 0.07591 1.54838 0.049 0.960923 Height 0.04431 0.66029 0.067 0.946530 Body.mass.index -0.15359 1.46924 -0.105 0.916792 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.379 on 457 degrees of freedom Multiple R-squared: 0.561, Adjusted R-squared: 0.5034 F-statistic: 9.733 on 60 and 457 DF, p-value: < 2.2e-16
Due to low value of adjusted R square and lower significance of predictors, i tried to add automated interaction. It give me a good R square value and reduce the residuals but the interactions are more than 3k.
Call: lm(formula = Absenteeism.time.in.hours ~ .^2, data = trainData) Residuals: Min 1Q Median 3Q Max -2.5 0.0 0.0 0.0 2.5 Coefficients: (981 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 4.544e+04 1.563e+04 2.907 0.00630 ** Reason.for.absence3 -7.661e+04 2.679e+04 -2.860 0.00710 ** Reason.for.absence5 5.397e+04 1.910e+04 2.825 0.00775 ** Reason.for.absence6 -3.851e+04 1.306e+04 -2.950 0.00564 ** Reason.for.absence7 -7.240e+02 2.875e+02 -2.518 0.01653 * Reason.for.absence8 1.230e+04 5.555e+03 2.215 0.03340 * Reason.for.absence9 1.230e+04 5.555e+03 2.215 0.03340 * Reason.for.absence10 4.106e+04 1.655e+04 2.481 0.01803 * Reason.for.absence11 -6.060e+04 2.089e+04 -2.901 0.00639 ** Reason.for.absence12 1.750e+04 6.150e+03 2.846 0.00736 ** Reason.for.absence13 -1.374e+05 4.993e+04 -2.752 0.00932 ** Reason.for.absence14 -2.734e+04 8.978e+03 -3.045 0.00440 ** Reason.for.absence15 -3.418e+04 1.144e+04 -2.988 0.00511 ** Reason.for.absence16 -2.823e+04 9.312e+03 -3.032 0.00455 ** Reason.for.absence17 -2.889e+04 9.804e+03 -2.947 0.00568 ** Reason.for.absence18 -9.907e+04 3.277e+04 -3.023 0.00466 ** Reason.for.absence19 -5.689e+04 1.933e+04 -2.943 0.00573 ** Reason.for.absence20 -6.395e+04 2.133e+04 -2.998 0.00497 ** Reason.for.absence21 -7.894e+04 2.717e+04 -2.905 0.00632 ** Reason.for.absence22 1.660e+06 6.382e+05 2.601 0.01353 * Reason.for.absence23 1.244e+04 6.789e+03 1.833 0.07536 . Reason.for.absence24 -5.574e+04 1.879e+04 -2.966 0.00541 ** Reason.for.absence25 4.663e+03 1.590e+03 2.932 0.00589 ** Reason.for.absence26 -1.971e+04 6.264e+03 -3.146 0.00337 ** Reason.for.absence27 -6.624e+04 2.221e+04 -2.983 0.00517 ** Reason.for.absence28 -4.295e+04 1.460e+04 -2.942 0.00575 ** Month.of.absence2 -6.860e+04 2.381e+04 -2.881 0.00673 ** Month.of.absence3 -4.356e+04 1.493e+04 -2.918 0.00612 ** Month.of.absence4 -3.054e+03 1.196e+03 -2.553 0.01518 * Month.of.absence5 -4.884e+04 1.745e+04 -2.798 0.00830 ** Month.of.absence6 -1.420e+03 8.899e+02 -1.595 0.11959 Month.of.absence7 -5.203e+04 1.796e+04 -2.896 0.00647 ** Month.of.absence8 -4.492e+03 1.749e+03 -2.569 0.01462 * Month.of.absence9 -1.913e+03 7.176e+02 -2.666 0.01153 * Month.of.absence10 1.324e+04 4.522e+03 2.928 0.00596 ** Month.of.absence11 -3.700e+03 1.467e+03 -2.523 0.01634 * Month.of.absence12 -3.260e+03 1.297e+03 -2.514 0.01671 * Day.of.the.week3 -1.182e+04 4.541e+03 -2.602 0.01349 * Day.of.the.week4 9.954e+02 3.746e+02 2.657 0.01178 * Day.of.the.week5 4.068e+02 1.498e+02 2.715 0.01022 * Day.of.the.week6 -2.288e+01 2.250e+01 -1.017 0.31611 Seasons2 5.000e-01 1.610e+00 0.311 0.75796 Seasons3 1.000e+00 1.054e+00 0.949 0.34920 Seasons4 -1.000e+00 1.217e+00 -0.822 0.41679 Transportation.expense 2.641e+04 9.451e+03 2.795 0.00838 ** Distance.from.Residence.to.Work -8.374e+03 2.955e+03 -2.834 0.00759 ** Service.time 2.751e+04 9.591e+03 2.868 0.00695 ** Age 6.305e+03 2.487e+03 2.536 0.01585 * Work.load.Average.day 1.179e+03 4.106e+02 2.872 0.00688 ** Hit.target -4.706e+02 1.622e+02 -2.901 0.00639 ** Disciplinary.failure1 8.626e+03 2.695e+03 3.200 0.00292 ** Education2 1.491e+04 5.099e+03 2.924 0.00603 ** Education3 -2.001e+04 6.578e+03 -3.043 0.00442 ** Education4 6.296e+05 2.411e+05 2.611 0.01321 * Son 2.008e+02 6.406e+01 3.134 0.00348 ** Social.drinker1 2.040e+03 6.412e+02 3.182 0.00306 ** Social.smoker1 7.115e+02 2.453e+02 2.901 0.00639 ** Pet -3.460e+02 1.344e+02 -2.575 0.01440 * Weight 2.866e+02 5.899e+02 0.486 0.63007 Height -1.068e+03 4.698e+02 -2.274 0.02923 * Body.mass.index 5.374e+02 5.141e+02 1.045 0.30297 Reason.for.absence3:Month.of.absence2 NA NA NA NA Reason.for.absence5:Month.of.absence2 NA NA NA NA Reason.for.absence6:Month.of.absence2 NA NA NA NA Reason.for.absence7:Month.of.absence2 4.696e+04 1.665e+04 2.820 0.00785 ** Reason.for.absence8:Month.of.absence2 NA NA NA NA Reason.for.absence9:Month.of.absence2 NA NA NA NA Reason.for.absence10:Month.of.absence2 3.832e+04 1.324e+04 2.895 0.00649 ** Reason.for.absence11:Month.of.absence2 1.720e+05 5.997e+04 2.868 0.00695 ** Reason.for.absence12:Month.of.absence2 NA NA NA NA Reason.for.absence13:Month.of.absence2 6.542e+04 2.260e+04 2.894 0.00650 ** Reason.for.absence14:Month.of.absence2 NA NA NA NA Reason.for.absence15:Month.of.absence2 NA NA NA NA Reason.for.absence16:Month.of.absence2 NA NA NA NA Reason.for.absence17:Month.of.absence2 NA NA NA NA Reason.for.absence18:Month.of.absence2 NA NA NA NA Reason.for.absence19:Month.of.absence2 NA NA NA NA Reason.for.absence20:Month.of.absence2 7.802e+04 2.654e+04 2.940 0.00579 ** Reason.for.absence21:Month.of.absence2 NA NA NA NA Reason.for.absence22:Month.of.absence2 1.966e+04 6.250e+03 3.145 0.00338 ** Reason.for.absence23:Month.of.absence2 6.562e+04 2.267e+04 2.895 0.00650 ** Reason.for.absence24:Month.of.absence2 NA NA NA NA Reason.for.absence25:Month.of.absence2 1.463e+04 5.231e+03 2.796 0.00835 ** Reason.for.absence26:Month.of.absence2 6.663e+04 2.276e+04 2.927 0.00598 ** Reason.for.absence27:Month.of.absence2 6.653e+04 2.298e+04 2.895 0.00650 ** Reason.for.absence28:Month.of.absence2 6.595e+04 2.279e+04 2.894 0.00650 ** Reason.for.absence3:Month.of.absence3 NA NA NA NA Reason.for.absence5:Month.of.absence3 NA NA NA NA Reason.for.absence6:Month.of.absence3 8.536e+03 2.808e+03 3.040 0.00446 ** Reason.for.absence7:Month.of.absence3 1.911e+04 6.735e+03 2.837 0.00752 ** Reason.for.absence8:Month.of.absence3 -5.313e+04 1.956e+04 -2.717 0.01017 * Reason.for.absence9:Month.of.absence3 NA NA NA NA Reason.for.absence10:Month.of.absence3 4.355e+04 1.588e+04 2.742 0.00957 ** Reason.for.absence11:Month.of.absence3 6.423e+04 2.206e+04 2.912 0.00622 ** Reason.for.absence12:Month.of.absence3 -4.145e+04 1.461e+04 -2.836 0.00754 ** Reason.for.absence13:Month.of.absence3 3.752e+04 1.273e+04 2.948 0.00567 ** Reason.for.absence14:Month.of.absence3 1.287e+04 4.543e+03 2.832 0.00762 ** Reason.for.absence15:Month.of.absence3 NA NA NA NA Reason.for.absence16:Month.of.absence3 NA NA NA NA Reason.for.absence17:Month.of.absence3 NA NA NA NA Reason.for.absence18:Month.of.absence3 1.202e+04 4.169e+03 2.883 0.00670 ** Reason.for.absence19:Month.of.absence3 9.804e+04 3.453e+04 2.839 0.00748 ** Reason.for.absence20:Month.of.absence3 4.404e+04 1.497e+04 2.941 0.00576 ** Reason.for.absence21:Month.of.absence3 6.880e+04 2.305e+04 2.984 0.00515 ** Reason.for.absence22:Month.of.absence3 5.464e+04 1.893e+04 2.887 0.00663 ** Reason.for.absence23:Month.of.absence3 3.812e+04 1.294e+04 2.947 0.00568 ** Reason.for.absence24:Month.of.absence3 NA NA NA NA Reason.for.absence25:Month.of.absence3 -4.291e+03 2.684e+03 -1.599 0.11892 Reason.for.absence26:Month.of.absence3 NA NA NA NA Reason.for.absence27:Month.of.absence3 3.913e+04 1.330e+04 2.942 0.00575 ** Reason.for.absence28:Month.of.absence3 3.855e+04 1.308e+04 2.946 0.00569 ** Reason.for.absence3:Month.of.absence4 NA NA NA NA Reason.for.absence5:Month.of.absence4 NA NA NA NA Reason.for.absence6:Month.of.absence4 NA NA NA NA Reason.for.absence7:Month.of.absence4 NA NA NA NA Reason.for.absence8:Month.of.absence4 NA NA NA NA Reason.for.absence9:Month.of.absence4 NA NA NA NA Reason.for.absence10:Month.of.absence4 -9.741e+04 3.542e+04 -2.750 0.00936 ** Reason.for.absence11:Month.of.absence4 NA NA NA NA Reason.for.absence12:Month.of.absence4 NA NA NA NA Reason.for.absence13:Month.of.absence4 2.601e+03 1.085e+03 2.397 0.02200 * Reason.for.absence14:Month.of.absence4 -2.856e+04 9.594e+03 -2.977 0.00526 ** Reason.for.absence15:Month.of.absence4 NA NA NA NA Reason.for.absence16:Month.of.absence4 NA NA NA NA Reason.for.absence17:Month.of.absence4 NA NA NA NA Reason.for.absence18:Month.of.absence4 NA NA NA NA Reason.for.absence19:Month.of.absence4 5.713e+04 2.049e+04 2.788 0.00852 ** Reason.for.absence20:Month.of.absence4 5.755e+03 1.883e+03 3.057 0.00426 ** Reason.for.absence21:Month.of.absence4 NA NA NA NA Reason.for.absence22:Month.of.absence4 -1.401e+06 5.376e+05 -2.605 0.01338 * Reason.for.absence23:Month.of.absence4 -7.467e+02 2.594e+02 -2.878 0.00677 ** Reason.for.absence24:Month.of.absence4 NA NA NA NA Reason.for.absence25:Month.of.absence4 NA NA NA NA Reason.for.absence26:Month.of.absence4 4.042e+03 1.273e+03 3.176 0.00311 ** Reason.for.absence27:Month.of.absence4 -1.778e+03 6.852e+02 -2.594 0.01375 * Reason.for.absence28:Month.of.absence4 NA NA NA NA Reason.for.absence3:Month.of.absence5 NA NA NA NA Reason.for.absence5:Month.of.absence5 -4.875e+04 1.641e+04 -2.970 0.00534 ** Reason.for.absence6:Month.of.absence5 7.054e+04 2.496e+04 2.827 0.00773 ** Reason.for.absence7:Month.of.absence5 5.239e+04 1.931e+04 2.713 0.01028 * Reason.for.absence8:Month.of.absence5 -4.377e+04 1.553e+04 -2.819 0.00788 ** Reason.for.absence9:Month.of.absence5 NA NA NA NA Reason.for.absence10:Month.of.absence5 -2.085e+04 7.934e+03 -2.627 0.01269 * Reason.for.absence11:Month.of.absence5 1.446e+04 5.346e+03 2.705 0.01047 * Reason.for.absence12:Month.of.absence5 -1.809e+04 5.718e+03 -3.163 0.00322 ** Reason.for.absence13:Month.of.absence5 5.085e+04 1.808e+04 2.813 0.00799 ** Reason.for.absence14:Month.of.absence5 NA NA NA NA Reason.for.absence15:Month.of.absence5 NA NA NA NA Reason.for.absence16:Month.of.absence5 NA NA NA NA Reason.for.absence17:Month.of.absence5 NA NA NA NA Reason.for.absence18:Month.of.absence5 1.564e+05 5.199e+04 3.009 0.00484 ** Reason.for.absence19:Month.of.absence5 1.179e+05 4.202e+04 2.806 0.00814 ** Reason.for.absence20:Month.of.absence5 5.473e+04 1.943e+04 2.817 0.00791 ** Reason.for.absence21:Month.of.absence5 NA NA NA NA Reason.for.absence22:Month.of.absence5 -1.312e+06 5.051e+05 -2.597 0.01366 * Reason.for.absence23:Month.of.absence5 5.121e+04 1.822e+04 2.810 0.00805 ** Reason.for.absence24:Month.of.absence5 NA NA NA NA Reason.for.absence25:Month.of.absence5 6.797e+04 2.304e+04 2.951 0.00562 ** Reason.for.absence26:Month.of.absence5 3.132e+04 1.081e+04 2.898 0.00644 ** Reason.for.absence27:Month.of.absence5 5.432e+04 1.929e+04 2.816 0.00793 ** Reason.for.absence28:Month.of.absence5 5.175e+04 1.836e+04 2.819 0.00788 ** Reason.for.absence3:Month.of.absence6 NA NA NA NA Reason.for.absence5:Month.of.absence6 -5.704e+04 1.880e+04 -3.035 0.00452 ** Reason.for.absence6:Month.of.absence6 NA NA NA NA Reason.for.absence7:Month.of.absence6 -5.138e+04 1.670e+04 -3.077 0.00405 ** Reason.for.absence8:Month.of.absence6 -1.437e+04 5.287e+03 -2.718 0.01014 * Reason.for.absence9:Month.of.absence6 NA NA NA NA Reason.for.absence10:Month.of.absence6 -3.792e+04 1.316e+04 -2.882 0.00671 ** Reason.for.absence11:Month.of.absence6 -1.584e+04 5.474e+03 -2.894 0.00650 ** Reason.for.absence12:Month.of.absence6 NA NA NA NA Reason.for.absence13:Month.of.absence6 5.136e+04 1.951e+04 2.632 0.01254 * Reason.for.absence14:Month.of.absence6 NA NA NA NA Reason.for.absence15:Month.of.absence6 NA NA NA NA Reason.for.absence16:Month.of.absence6 NA NA NA NA Reason.for.absence17:Month.of.absence6 NA NA NA NA Reason.for.absence18:Month.of.absence6 NA NA NA NA Reason.for.absence19:Month.of.absence6 1.011e+05 3.666e+04 2.758 0.00917 ** Reason.for.absence20:Month.of.absence6 8.580e+03 2.933e+03 2.925 0.00600 ** Reason.for.absence21:Month.of.absence6 NA NA NA NA Reason.for.absence22:Month.of.absence6 -1.318e+06 5.064e+05 -2.602 0.01349 * Reason.for.absence23:Month.of.absence6 -9.685e+02 3.283e+02 -2.950 0.00563 ** Reason.for.absence24:Month.of.absence6 NA NA NA NA Reason.for.absence25:Month.of.absence6 -1.049e+04 3.348e+03 -3.133 0.00349 ** Reason.for.absence26:Month.of.absence6 -5.073e+04 1.828e+04 -2.775 0.00879 ** Reason.for.absence27:Month.of.absence6 NA NA NA NA Reason.for.absence28:Month.of.absence6 NA NA NA NA Reason.for.absence3:Month.of.absence7 NA NA NA NA Reason.for.absence5:Month.of.absence7 NA NA NA NA Reason.for.absence6:Month.of.absence7 1.017e+03 2.343e+03 0.434 0.66694 Reason.for.absence7:Month.of.absence7 -5.014e+03 1.777e+03 -2.821 0.00783 ** Reason.for.absence8:Month.of.absence7 NA NA NA NA Reason.for.absence9:Month.of.absence7 NA NA NA NA Reason.for.absence10:Month.of.absence7 NA NA NA NA Reason.for.absence11:Month.of.absence7 1.911e+04 6.310e+03 3.029 0.00459 ** Reason.for.absence12:Month.of.absence7 NA NA NA NA Reason.for.absence13:Month.of.absence7 8.641e+04 3.107e+04 2.781 0.00867 ** Reason.for.absence14:Month.of.absence7 2.445e+04 7.985e+03 3.062 0.00421 ** Reason.for.absence15:Month.of.absence7 -4.872e+03 2.925e+03 -1.666 0.10467 Reason.for.absence16:Month.of.absence7 NA NA NA NA Reason.for.absence17:Month.of.absence7 NA NA NA NA [ reached getOption("max.print") -- omitted 1264 rows ] --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.8605 on 35 degrees of freedom Multiple R-squared: 0.9956, Adjusted R-squared: 0.935 F-statistic: 16.43 on 482 and 35 DF, p-value: 1.228e-15
I am new to stats and regression. SO what is the correct way to perform this. How should I choose predictors and interaction to add to my model.
Posts: 2
Participants: 2