@kanav wrote:
Thanks Analytics Vidhya and Club Mahindra for organising such a wonderful hackathon,The competition was quite intense and dataset was very clean to work.
Approach :
Step-1 :
I started my problem with very basic approach changing all the features (resort id , persontravellingID , main_product_code and other ordinal features to category . Made some common for each date columns [booking date , checkin_date, checkout_date ]:
- Weekday
- Month
- Day
- Day of year
- Week of year
- Is month end
- Year
Step-2: Intuitive features
- In_out : Checkout_Date - Checkin_Date
- book_in:Checkout_date - booking_date
- Roomnights per stay : roomnights/in_out
- Roomnights per book span : roomnights / book_out
Step - 3: Time Based Features :
- Prev_resort_time = Time when the resort was previously booked.
- Prev_resort_member_time = Time when the resort was previously booked by a particular member.
- Next_resort_time = Time when the resort will Next booked.
- Next_resort_member_time = Time when the resort will next booked by a particular member.
Step-4 : Groupby Features
S.No. TYPE Value_column ON 1. COUNT _ RESORT_ID 2 COUNT _ RESORT_ID,MemberID 3. COUNT _ [‘resort_id’,‘checkout_dateyear’,‘checkout_datemonth’] 4. COUNT _ [‘memberid’,‘checkout_dateyear’]
5 VAR roomnights RESORT_ID 6 Median roomnights RESORT_ID,MemberID 7. MAX roomnights [resort_id,checkout_dateyear,checkout_datemonth] 8. MIN roomnights [memberid’,‘checkout_dateyear’]
9 VAR in_out RESORT_ID 10 Median in_out RESORT_ID,MemberID 11. MAX in_out [‘resort_id’,‘checkout_dateyear’,‘checkout_datemonth’] 12. MIN in_out [‘memberid’,‘checkout_dateyear’]
13 VAR total_pax RESORT_ID 14 Median total_pax RESORT_ID,MemberID 15 MAX total_pax [‘resort_id’,‘checkout_dateyear’,‘checkout_datemonth’] 16 MIN total_pax [‘memberid’,‘checkout_dateyear’] …… in Similar fashion approx ~ 72 combinations were tried which gave a boost of rmse from 96 to 95.3 on LB and nearly same change in Local CV.
Modeling:
My final model consist of ensemble of
[ lightGBM , Catboost , 5_fold_Light GBM , 5_fold_Catboost and stacking of [xgb,catboost,lightGBM]
- Params tuned using Bayesian Optimization :
“objective” : “regression”, “metric” : “rmse”, ‘n_estimators’:3000, ‘early_stopping_rounds’:200,
“num_leaves” : 31, “learning_rate” : 0.05, “bagging_fraction” : 0.9,
“bagging_seed” : 0, “num_threads” : 4,“colsample_bytree” : 0.8,“lambda_l1”: 25,“min_child_weight” : 44.9, “min_child_samples”: 5 }
Posts: 1
Participants: 1