In this notebook, we will be conducting a time series analysis on the Ethereum price. As of late, cryptocurrencies have been incredibly popular in the media and in the financial community. Let’s see if we can build a good time series model.

Reading In Data

ether = read.csv(file.choose(), header=T)
attach(ether)
head(ether)
dim(ether)
[1] 866   3
colnames(ether)
[1] "Date.UTC."     "UnixTimeStamp" "Value"        

Let’s plot date against the price (Note: this will create a funky looking plot. But we will do it for the sake of an example)

plot(Date.UTC., Value)

Now, in order to take advantage of the data variable we will need to transform. If you are new to time series analysis, it is important that you transform your date explanatory variable.

Now, let’s do this the correct way.

time = 1:866 
plot(time, Value)

From this plot, we can see that there isn’t really a linear relationship of Value with time. Let’s create a time squared variable and build a simple linear regression model (one without the time squared variable, and one with it).

Simple Linear Regression Model

m1 = lm(Value~time)
summary(m1)

Call:
lm(formula = Value ~ time)

Residuals:
    Min      1Q  Median      3Q     Max 
-127.37  -72.99    4.16   60.79  443.83 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -92.46162    5.66996  -16.31   <2e-16 ***
time          0.40199    0.01133   35.48   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 83.36 on 864 degrees of freedom
Multiple R-squared:  0.593, Adjusted R-squared:  0.5925 
F-statistic:  1259 on 1 and 864 DF,  p-value: < 2.2e-16

It appears that we have a significant model where the F p-value=2.2e-16 < alpha=.05. Let’s see if the squared term will improve the R2 and R2a value

time_sq = time ** 2 
m2 = lm(Value~time+time_sq)
summary(m2)

Call:
lm(formula = Value ~ time + time_sq)

Residuals:
    Min      1Q  Median      3Q     Max 
-87.329 -36.385   2.573  30.218 295.398 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.804e+01  5.049e+00   11.50   <2e-16 ***
time        -6.384e-01  2.689e-02  -23.74   <2e-16 ***
time_sq      1.200e-03  3.004e-05   39.95   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 49.41 on 863 degrees of freedom
Multiple R-squared:  0.8571,    Adjusted R-squared:  0.8568 
F-statistic:  2589 on 2 and 863 DF,  p-value: < 2.2e-16

The residual standard error decreased by half, and the R2 and R2a increased greatly. We now have a model that explains 85.71% of the variation amongst Value, which represents the price of one ether.

Let’s see if we can perform a random walk and conduct some forecasting

Random Walk and Forecasting

new_value = ts(Value)
plot(new_value, type="b")

So it appears that Value isn’t really a random walk. It only becomes a random walk after time 600.

ether[600,1]
[1] 3/20/2017
866 Levels: 1/1/2016 1/1/2017 1/10/2016 1/10/2017 1/11/2016 ... 9/9/2017

If we look at the 600th observation, we find the date to be at 3/20/2017. It makes sense why the data appears to look more like a random walk because this is when the value of ethereum began to become more volatile as more people started to become a part of ethereum (mining, create DApps, etc.).

Let’s make another plot after time 600 to see if it looks more like a random walk.

new_value = Value[600:length(Value)]
time_value = ts(new_value)
plot(time_value, type="b")

For this random walk let’s conduct some forecasting. We will first see which time lag contributes the most to forecasting the value of ethereum.

Forecasting

ether[nrow(ether), ]
tail(new_value)
[1] 422.48 421.15 513.29 656.52 699.09 693.58
new_value[length(new_value)]
[1] 693.58

From this output, we find that the last day recorded is on 12/14/2017. The last value of Value is $693.58. Let’s conduct a forecast to today (12/18/2017), and a week from the 14th.

Now this is incredibly off the mark. The price for ethereum today is $789. The reason why the price has increased, is because there has been so much activity within the past week.

diff = new_value[-1]- new_value[1:1-length(new_value)]
c_bar = mean(diff)
four_days = new_value[length(new_value)] + 4*c_bar
seven_days = new_value[length(new_value)] + 7*c_bar
four_days
[1] 703.4534
seven_days
[1] 710.8584

The random walk forecasts that in 4 days the price of ethereum will be $703.4534 and in a week it will be $710.8584.

Autocorrelations

acf(time_value, lag.max=20,plot=F)
pacf(time_value, lag.max=20)

From the output from the autocorrelation, and the plot from the partial autocorrelation, we find that lag1 sufficiently explains what is occuring at the subsequent timestamps. Let’s create an AR(1) model using lag1.

yt = new_value[-1]
ylag1= new_value[-length(new_value)]

plot(ylag1, yt)

We find a very strong linear relationship between the current time and the previous timestamp. Let us build the model.

# AR(1)
m2 = lm(yt~ylag1)
summary(m2)

Call:
lm(formula = yt ~ ylag1)

Residuals:
    Min      1Q  Median      3Q     Max 
-55.390  -7.473  -1.638   5.452 138.648 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.492190   2.599781   0.189     0.85    
ylag1       1.007967   0.009481 106.316   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.91 on 264 degrees of freedom
Multiple R-squared:  0.9772,    Adjusted R-squared:  0.9771 
F-statistic: 1.13e+04 on 1 and 264 DF,  p-value: < 2.2e-16

So we have an incredible model in which the R2 value=.9772, and the R2a=.9971. The residual standard error decreased compared to model one, and we have a very significant model. In the next week or so, we will pull in more data to verify, by predicting on, the current value of ethereum. This is very exciting!

Data Science Rules!

Until next time,

Joel Afriyie

LS0tCnRpdGxlOiAiVGltZSBTZXJpZXMgb24gRXRoZXJldW0gSGlzdG9yaWNhbCBEYXRhIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpJbiB0aGlzIG5vdGVib29rLCB3ZSB3aWxsIGJlIGNvbmR1Y3RpbmcgYSB0aW1lIHNlcmllcyBhbmFseXNpcyBvbiB0aGUgRXRoZXJldW0gcHJpY2UuIEFzIG9mIGxhdGUsIGNyeXB0b2N1cnJlbmNpZXMgaGF2ZSBiZWVuIGluY3JlZGlibHkgcG9wdWxhciBpbiB0aGUgbWVkaWEgYW5kIGluIHRoZSBmaW5hbmNpYWwgY29tbXVuaXR5LiBMZXQncyBzZWUgaWYgd2UgY2FuIGJ1aWxkIGEgZ29vZCB0aW1lIHNlcmllcyBtb2RlbC4KCgojIyBSZWFkaW5nIEluIERhdGEKYGBge3J9CmV0aGVyID0gcmVhZC5jc3YoZmlsZS5jaG9vc2UoKSwgaGVhZGVyPVQpCmBgYAoKYGBge3J9CmF0dGFjaChldGhlcikKaGVhZChldGhlcikKZGltKGV0aGVyKQpjb2xuYW1lcyhldGhlcikKYGBgCkxldCdzIHBsb3QgZGF0ZSBhZ2FpbnN0IHRoZSBwcmljZSAoTm90ZTogdGhpcyB3aWxsIGNyZWF0ZSBhIGZ1bmt5IGxvb2tpbmcgcGxvdC4gQnV0IHdlIHdpbGwgZG8gaXQgZm9yIHRoZSBzYWtlIG9mIGFuIGV4YW1wbGUpCgpgYGB7cn0KcGxvdChEYXRlLlVUQy4sIFZhbHVlKQpgYGAKCk5vdywgaW4gb3JkZXIgdG8gdGFrZSBhZHZhbnRhZ2Ugb2YgdGhlIGRhdGEgdmFyaWFibGUgd2Ugd2lsbCBuZWVkIHRvIHRyYW5zZm9ybS4gSWYgeW91IGFyZSBuZXcgdG8gdGltZSBzZXJpZXMgYW5hbHlzaXMsIGl0IGlzIGltcG9ydGFudCB0aGF0IHlvdSB0cmFuc2Zvcm0geW91ciBkYXRlIGV4cGxhbmF0b3J5IHZhcmlhYmxlLgoKTm93LCBsZXQncyBkbyB0aGlzIHRoZSBjb3JyZWN0IHdheS4gCgpgYGB7cn0KdGltZSA9IDE6ODY2IApwbG90KHRpbWUsIFZhbHVlKQpgYGAKRnJvbSB0aGlzIHBsb3QsIHdlIGNhbiBzZWUgdGhhdCB0aGVyZSBpc24ndCByZWFsbHkgYSBsaW5lYXIgcmVsYXRpb25zaGlwIG9mIFZhbHVlIHdpdGggdGltZS4gTGV0J3MgY3JlYXRlIGEgdGltZSBzcXVhcmVkIHZhcmlhYmxlIGFuZCBidWlsZCBhIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCAob25lIHdpdGhvdXQgdGhlIHRpbWUgc3F1YXJlZCB2YXJpYWJsZSwgYW5kIG9uZSB3aXRoIGl0KS4KCiMjU2ltcGxlIExpbmVhciBSZWdyZXNzaW9uIE1vZGVsCmBgYHtyfQptMSA9IGxtKFZhbHVlfnRpbWUpCnN1bW1hcnkobTEpCmBgYApJdCBhcHBlYXJzIHRoYXQgd2UgaGF2ZSBhIHNpZ25pZmljYW50IG1vZGVsIHdoZXJlIHRoZSBGIHAtdmFsdWU9Mi4yZS0xNiA8IGFscGhhPS4wNS4gTGV0J3Mgc2VlIGlmIHRoZSBzcXVhcmVkIHRlcm0gd2lsbCBpbXByb3ZlIHRoZSBSMiBhbmQgUjJhIHZhbHVlCgpgYGB7cn0KdGltZV9zcSA9IHRpbWUgKiogMiAKbTIgPSBsbShWYWx1ZX50aW1lK3RpbWVfc3EpCnN1bW1hcnkobTIpCmBgYApUaGUgcmVzaWR1YWwgc3RhbmRhcmQgZXJyb3IgZGVjcmVhc2VkIGJ5IGhhbGYsIGFuZCB0aGUgUjIgYW5kIFIyYSBpbmNyZWFzZWQgZ3JlYXRseS4gV2Ugbm93IGhhdmUgYSBtb2RlbCB0aGF0IGV4cGxhaW5zIDg1LjcxJSBvZiB0aGUgdmFyaWF0aW9uIGFtb25nc3QgVmFsdWUsIHdoaWNoIHJlcHJlc2VudHMgdGhlIHByaWNlIG9mIG9uZSBldGhlci4gCgpMZXQncyBzZWUgaWYgd2UgY2FuIHBlcmZvcm0gYSByYW5kb20gd2FsayBhbmQgY29uZHVjdCBzb21lIGZvcmVjYXN0aW5nCgojI1JhbmRvbSBXYWxrIGFuZCBGb3JlY2FzdGluZwpgYGB7cn0KbmV3X3ZhbHVlID0gdHMoVmFsdWUpCnBsb3QobmV3X3ZhbHVlLCB0eXBlPSJiIikKYGBgClNvIGl0IGFwcGVhcnMgdGhhdCBWYWx1ZSBpc24ndCByZWFsbHkgYSByYW5kb20gd2Fsay4gSXQgb25seSBiZWNvbWVzIGEgcmFuZG9tIHdhbGsgYWZ0ZXIgdGltZSA2MDAuCmBgYHtyfQpldGhlcls2MDAsMV0KYGBgCklmIHdlIGxvb2sgYXQgdGhlIDYwMHRoIG9ic2VydmF0aW9uLCB3ZSBmaW5kIHRoZSBkYXRlIHRvIGJlIGF0IDMvMjAvMjAxNy4gSXQgbWFrZXMgc2Vuc2Ugd2h5IHRoZSBkYXRhIGFwcGVhcnMgdG8gbG9vayBtb3JlIGxpa2UgYSByYW5kb20gd2FsayBiZWNhdXNlIHRoaXMgaXMgd2hlbiB0aGUgdmFsdWUgb2YgZXRoZXJldW0gYmVnYW4gdG8gYmVjb21lIG1vcmUgdm9sYXRpbGUgYXMgbW9yZSBwZW9wbGUgc3RhcnRlZCB0byBiZWNvbWUgYSBwYXJ0IG9mIGV0aGVyZXVtIChtaW5pbmcsIGNyZWF0ZSBEQXBwcywgZXRjLikuCgpMZXQncyBtYWtlIGFub3RoZXIgcGxvdCBhZnRlciB0aW1lIDYwMCB0byBzZWUgaWYgaXQgbG9va3MgbW9yZSBsaWtlIGEgcmFuZG9tIHdhbGsuIApgYGB7cn0KbmV3X3ZhbHVlID0gVmFsdWVbNjAwOmxlbmd0aChWYWx1ZSldCnRpbWVfdmFsdWUgPSB0cyhuZXdfdmFsdWUpCnBsb3QodGltZV92YWx1ZSwgdHlwZT0iYiIpCmBgYApGb3IgdGhpcyByYW5kb20gd2FsayBsZXQncyBjb25kdWN0IHNvbWUgZm9yZWNhc3RpbmcuIFdlIHdpbGwgZmlyc3Qgc2VlIHdoaWNoIHRpbWUgbGFnIGNvbnRyaWJ1dGVzIHRoZSBtb3N0IHRvIGZvcmVjYXN0aW5nIHRoZSB2YWx1ZSBvZiBldGhlcmV1bS4gCgojI0ZvcmVjYXN0aW5nIAoKYGBge3J9CmV0aGVyW25yb3coZXRoZXIpLCBdCnRhaWwobmV3X3ZhbHVlKQpuZXdfdmFsdWVbbGVuZ3RoKG5ld192YWx1ZSldCmBgYApGcm9tIHRoaXMgb3V0cHV0LCB3ZSBmaW5kIHRoYXQgdGhlIGxhc3QgZGF5IHJlY29yZGVkIGlzIG9uIDEyLzE0LzIwMTcuIFRoZSBsYXN0IHZhbHVlIG9mIFZhbHVlIGlzICQ2OTMuNTguIExldCdzIGNvbmR1Y3QgYSBmb3JlY2FzdCB0byB0b2RheSAoMTIvMTgvMjAxNyksIGFuZCBhIHdlZWsgZnJvbSB0aGUgMTR0aC4KCk5vdyB0aGlzIGlzIGluY3JlZGlibHkgb2ZmIHRoZSBtYXJrLiBUaGUgcHJpY2UgZm9yIGV0aGVyZXVtIHRvZGF5IGlzICQ3ODkuIFRoZSByZWFzb24gd2h5IHRoZSBwcmljZSBoYXMgaW5jcmVhc2VkLCBpcyBiZWNhdXNlIHRoZXJlIGhhcyBiZWVuIHNvIG11Y2ggYWN0aXZpdHkgd2l0aGluIHRoZSBwYXN0IHdlZWsuIAoKYGBge3J9CmRpZmYgPSBuZXdfdmFsdWVbLTFdLSBuZXdfdmFsdWVbMToxLWxlbmd0aChuZXdfdmFsdWUpXQpjX2JhciA9IG1lYW4oZGlmZikKCmZvdXJfZGF5cyA9IG5ld192YWx1ZVtsZW5ndGgobmV3X3ZhbHVlKV0gKyA0KmNfYmFyCnNldmVuX2RheXMgPSBuZXdfdmFsdWVbbGVuZ3RoKG5ld192YWx1ZSldICsgNypjX2Jhcgpmb3VyX2RheXMKc2V2ZW5fZGF5cwpgYGAKVGhlIHJhbmRvbSB3YWxrIGZvcmVjYXN0cyB0aGF0IGluIDQgZGF5cyB0aGUgcHJpY2Ugb2YgZXRoZXJldW0gd2lsbCBiZSAkNzAzLjQ1MzQgYW5kIGluIGEgd2VlayBpdCB3aWxsIGJlICQ3MTAuODU4NC4gCgoKIyNBdXRvY29ycmVsYXRpb25zIApgYGB7cn0KYWNmKHRpbWVfdmFsdWUsIGxhZy5tYXg9MjAscGxvdD1GKQpwYWNmKHRpbWVfdmFsdWUsIGxhZy5tYXg9MjApCmBgYApGcm9tIHRoZSBvdXRwdXQgZnJvbSB0aGUgYXV0b2NvcnJlbGF0aW9uLCBhbmQgdGhlIHBsb3QgZnJvbSB0aGUgcGFydGlhbCBhdXRvY29ycmVsYXRpb24sIHdlIGZpbmQgdGhhdCBsYWcxIHN1ZmZpY2llbnRseSBleHBsYWlucyB3aGF0IGlzIG9jY3VyaW5nIGF0IHRoZSBzdWJzZXF1ZW50IHRpbWVzdGFtcHMuIExldCdzIGNyZWF0ZSBhbiBBUigxKSBtb2RlbCB1c2luZyBsYWcxLiAKCmBgYHtyfQp5dCA9IG5ld192YWx1ZVstMV0KeWxhZzE9IG5ld192YWx1ZVstbGVuZ3RoKG5ld192YWx1ZSldCgpwbG90KHlsYWcxLCB5dCkKYGBgCldlIGZpbmQgYSB2ZXJ5IHN0cm9uZyBsaW5lYXIgcmVsYXRpb25zaGlwIGJldHdlZW4gdGhlIGN1cnJlbnQgdGltZSBhbmQgdGhlIHByZXZpb3VzIHRpbWVzdGFtcC4gTGV0IHVzIGJ1aWxkIHRoZSBtb2RlbC4gCgpgYGB7cn0KIyBBUigxKQptMiA9IGxtKHl0fnlsYWcxKQpzdW1tYXJ5KG0yKQpgYGAKU28gd2UgaGF2ZSBhbiBpbmNyZWRpYmxlIG1vZGVsIGluIHdoaWNoIHRoZSBSMiB2YWx1ZT0uOTc3MiwgYW5kIHRoZSBSMmE9Ljk5NzEuIFRoZSByZXNpZHVhbCBzdGFuZGFyZCBlcnJvciBkZWNyZWFzZWQgY29tcGFyZWQgdG8gbW9kZWwgb25lLCBhbmQgd2UgaGF2ZSBhIHZlcnkgc2lnbmlmaWNhbnQgbW9kZWwuIEluIHRoZSBuZXh0IHdlZWsgb3Igc28sIHdlIHdpbGwgcHVsbCBpbiBtb3JlIGRhdGEgdG8gdmVyaWZ5LCBieSBwcmVkaWN0aW5nIG9uLCB0aGUgY3VycmVudCB2YWx1ZSBvZiBldGhlcmV1bS4gVGhpcyBpcyB2ZXJ5IGV4Y2l0aW5nISAKCkRhdGEgU2NpZW5jZSBSdWxlcyEgCgpVbnRpbCBuZXh0IHRpbWUsIAoKSm9lbCBBZnJpeWllCgoKCgoK