@varun214 wrote:
Hi all
I have a dataset for entries of 3 months(aggregated on the daily level) for which I am trying to develop a multivariate time series model. The data has 8 different variables and I need to predict all of them and thought that multivariate time series would be best suited for the same. I tried it using the article at analytics vidhya and I am not able to understand a few things. I a new to modelling and still learning.
- The dataset predictions are way off the validation set. The month wise is not being followed and predicted values are just getting the increasing trend. It doesnot take into account that a new month is starting.
- I used the johan_test for checking the stationary but I think that it is not the only one to be used.
Can anyone advise me the best way to model such kind of trends.
Problem statement - Predict the variables for the next 15 days based on the last 3 months dataset(values aggregated on the daily basis).
Dataset -
FinalDataset_model copy.csv (5.0 KB)Code -
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.vector_ar.var_model import VAR
from statsmodels.tsa.vector_ar.vecm import coint_johansen
import numpy as np
from sklearn.metrics import mean_squared_error#read the data
df = pd.read_csv(“FinalDataset_to_Model.csv”)#check the dtypes
print(df.dtypes)df[‘Flight_Date’] = pd.to_datetime(df.Flight_Date , format = ‘%d/%m/%y’)
data = df.drop([‘Flight_Date’], axis=1)
data.index = df.Flight_Date#since the test works for only 12 variables, I have randomly dropped
#in the next iteration, I would drop another and check the eigenvalues
johan_test_temp = data
res = coint_johansen(johan_test_temp,-1,1).eig
print(res)#creating the train and validation set
train = data[:int(0.8*(len(data)))]
valid = data[int(0.8*(len(data))):]#fit the model
model = VAR(endog=train, freq=train.index.inferred_freq)
model_fit = model.fit()make prediction on validation
prediction = model_fit.forecast(model_fit.y, steps=len(valid))
cols = data.columns
pred = pd.DataFrame(index=range(0,len(prediction)),columns=[cols])
for j in range(0,4):
for i in range(0, len(prediction)):
pred.iloc[i][j] = prediction[i][j]#check rmse
for i in range(len(cols)):
print("\n\npred - ", pred.iloc[i])
print("valid - ", valid.iloc[i])
print(‘rmse value for’, i, 'is : ', np.sqrt(mean_squared_error(pred.iloc[i], valid.iloc[i])))Can anyone advise for the same.
Thanks
Posts: 1
Participants: 1