@sidharth.bolar89 wrote:
Hello ,
So I have been studying about time series analysis and after going through the available resources proceed to by applying to test my learnings by taking part in the time series problem in AV (here )However I seem to be way off the track but would like to understand if I am proceeding in the right direction:
The data consists of the hourly data for the period 2012 to 2014 for the hits on the fictional website
My first step would be to visualise the data and what I observed was a high variance in the value of data so proceeded by taking a weekly resample based on mean and forward filling if there were any missing valuesThe initial graph indicates an upward trend and the results of the ADF test confirm the series non-stationary:
Results of Dickey-Fuller Test:
Test Statistic 1.237717
p-value 0.996236
#Lags Used 8.000000
Number of Observations Used 144.000000
Critical Value (5%) -2.881829
Critical Value (1%) -3.476598
Critical Value (10%) -2.577589
dtype: float64I then proceeded by removing any outlier values if present
df[‘Count’] = df[‘Count’].clip(df[‘Count’].quantile(0.001), df[‘Count’].quantile(0.999))Also as there seems to be a vast difference in the variation I applied log values to even out the fluctuations:
plot of log values
This does not detrend the series so took first difference of log
The ADF test results after this is as follows:
Results of Dickey-Fuller Test:
Test Statistic -5.005418
p-value 0.000022
#Lags Used 12.000000
Number of Observations Used 139.000000
Critical Value (5%) -2.882568
Critical Value (1%) -3.478294
Critical Value (10%) -2.577983
dtype: float64with the above results i was confident that my series is stationary and can now proceed with the model building
The ACF and PACF plot results are then as follows:
plot_pacf(ts_week_log_diff,lags=5)
plot_acf(ts_week_log_diff,lags=5)
pyplot.show()Now based on the graphs I take them to be order 1 each and fir an ARIMA model
The result summary is as follows:
ARMA Model Results
==============================================================================
Dep. Variable: Count No. Observations: 152
Model: ARMA(1, 1) Log Likelihood -149.446
Method: css-mle S.D. of innovations 0.644
Date: Fri, 15 Dec 2017 AIC 306.893
Time: 16:39:35 BIC 318.988
Sample: 01-22-2012 HQIC 311.806
- 12-14-2014coef std err z P>|z| [0.025 0.975]
const 0.0210 0.010 2.055 0.042 0.001 0.041
ar.L1.Count -0.0495 0.105 -0.472 0.638 -0.255 0.156
ma.L1.Count -0.8020 0.072 -11.200 0.000 -0.942 -0.662
RootsReal Imaginary Modulus Frequency
AR.1 -20.2032 +0.0000j 20.2032 0.5000
MA.1 1.2469 +0.0000j 1.2469 0.0000The final predictions are way off the mark and am not entirely sure where I am making a mistake
I am learning to to this analysis and any directions on this will be highly appreciated
Posts: 1
Participants: 1