Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Time Series Analysis - queries on practise problem

$
0
0

@sidharth.bolar89 wrote:

Hello ,
So I have been studying about time series analysis and after going through the available resources proceed to by applying to test my learnings by taking part in the time series problem in AV (here )

However I seem to be way off the track but would like to understand if I am proceeding in the right direction:

The data consists of the hourly data for the period 2012 to 2014 for the hits on the fictional website
My first step would be to visualise the data and what I observed was a high variance in the value of data so proceeded by taking a weekly resample based on mean and forward filling if there were any missing values

The initial graph indicates an upward trend and the results of the ADF test confirm the series non-stationary:

Results of Dickey-Fuller Test:
Test Statistic 1.237717
p-value 0.996236
#Lags Used 8.000000
Number of Observations Used 144.000000
Critical Value (5%) -2.881829
Critical Value (1%) -3.476598
Critical Value (10%) -2.577589
dtype: float64

I then proceeded by removing any outlier values if present
df[‘Count’] = df[‘Count’].clip(df[‘Count’].quantile(0.001), df[‘Count’].quantile(0.999))

Also as there seems to be a vast difference in the variation I applied log values to even out the fluctuations:

plot of log values

This does not detrend the series so took first difference of log

The ADF test results after this is as follows:

Results of Dickey-Fuller Test:
Test Statistic -5.005418
p-value 0.000022
#Lags Used 12.000000
Number of Observations Used 139.000000
Critical Value (5%) -2.882568
Critical Value (1%) -3.478294
Critical Value (10%) -2.577983
dtype: float64

with the above results i was confident that my series is stationary and can now proceed with the model building

The ACF and PACF plot results are then as follows:

plot_pacf(ts_week_log_diff,lags=5)
plot_acf(ts_week_log_diff,lags=5)
pyplot.show()

Now based on the graphs I take them to be order 1 each and fir an ARIMA model

The result summary is as follows:

                         ARMA Model Results

==============================================================================
Dep. Variable: Count No. Observations: 152
Model: ARMA(1, 1) Log Likelihood -149.446
Method: css-mle S.D. of innovations 0.644
Date: Fri, 15 Dec 2017 AIC 306.893
Time: 16:39:35 BIC 318.988
Sample: 01-22-2012 HQIC 311.806
- 12-14-2014

              coef    std err          z      P>|z|      [0.025      0.975]

const 0.0210 0.010 2.055 0.042 0.001 0.041
ar.L1.Count -0.0495 0.105 -0.472 0.638 -0.255 0.156
ma.L1.Count -0.8020 0.072 -11.200 0.000 -0.942 -0.662
Roots

             Real           Imaginary           Modulus         Frequency

AR.1 -20.2032 +0.0000j 20.2032 0.5000
MA.1 1.2469 +0.0000j 1.2469 0.0000

The final predictions are way off the mark and am not entirely sure where I am making a mistake
I am learning to to this analysis and any directions on this will be highly appreciated

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles