機器學習:使用時間序列預測的Bitcoin Price預測模型

機器學習:使用時間序列預測的Bitcoin Price預測模型

這篇文章是關於使用時間序列預測的Bitcoin Price預測。時間序列預測與其他機器學習模型有很大不同,因為 -

  • 1.時間依賴性。因此,觀測值是獨立的線性迴歸模型的基本假設在這種情況下不成立。
  • 2.隨著增加或減少的趨勢,大多數時間序列都有某種形式的季節性趨勢,即特定時間範圍內的變化。

因此,不能使用簡單的機器學習模型,因此時間序列預測是一個不同的研究領域。在本文中,AR,MA和ARIMA等時間序列模型用於預測Bitcoin 的價格。

該數據集包含2013年4月至2017年8月Bitcoin的開盤價和收盤價

導入必要的庫

import pandas as kunfu

import numpy as dragon

import pylab as p

import matplotlib.pyplot as plot

from collections import Counter

import re

#importing packages for the prediction of time-series data

import statsmodels.api as sm

import statsmodels.tsa.api as smt

import statsmodels.formula.api as smf

from sklearn.metrics import mean_squared_error

繪製時間序列

將數據加載到訓練數據框中,然後使用日期作為索引,系列用x軸上的日期和y軸上的收盤價格繪製。

data = train['Close']

Date1 = train['Date']

train1 = train[['Date','Close']]

# Setting the Date as Index

train2 = train1.set_index('Date')

train2.sort_index(inplace=True)

print (type(train2))

print (train2.head())

plot.plot(train2)

plot.xlabel('Date', fontsize=12)

plot.ylabel('Price in USD', fontsize=12)

plot.title("Closing price distribution of bitcoin", fontsize=15)

plot.show()

機器學習:使用時間序列預測的Bitcoin Price預測模型

測試平穩性

增強Dicky Fuller測試:

增強Dicky Fuller測試是一種稱為單位根測試的統計測試。

單位根檢驗背後的直覺是它決定了時間序列由趨勢定義的強度。

ADF單位根檢驗和是應用最廣泛的一種

  • 1. Null Hypothesis (H0):原假設(null hypothesis)亦稱待驗假設、虛無假設、解消假設,時間序列可以用非平穩的單位根表示。。
  • 2. Alternative Hypothesis (H1): 備擇假設(Alternative Hypothesis),時間序列是固定的。

ADF值的解釋:

  • 1. p值 > 0.05:接原假設(H0),數據具有單位根並且是非平穩的。
  • 2. p值 <= 0.05:拒絕原假設(H0),數據是固定的。

from statsmodels.tsa.stattools import adfuller

def test_stationarity(x):

#Determing rolling statistics

rolmean = x.rolling(window=22,center=False).mean()

rolstd = x.rolling(window=12,center=False).std()

#Plot rolling statistics:

orig = plot.plot(x, color='blue',label='Original')

mean = plot.plot(rolmean, color='red', label='Rolling Mean')

std = plot.plot(rolstd, color='black', label = 'Rolling Std')

plot.legend(loc='best')

plot.title('Rolling Mean & Standard Deviation')

plot.show(block=False)

#Perform Dickey Fuller test

result=adfuller(x)

print('ADF Stastistic: %f'%result[0])

print('p-value: %f'%result[1])

pvalue=result[1]

for key,value in result[4].items():

if result[0]>value:

print("The graph is non stationery")

break

else:

print("The graph is stationery")

break;

print('Critical values:')

for key,value in result[4].items():

print('\t%s: %.3f ' % (key, value))

ts = train2['Close']

test_stationarity(ts)

機器學習:使用時間序列預測的Bitcoin Price預測模型

日誌轉換系列

日誌轉換用於糾正高度傾斜的數據。從而有助於預測過程。

ts_log = dragon.log(ts)

plot.plot(ts_log,color =“green”)

plot.show()

test_stationarity(ts_log)

機器學習:使用時間序列預測的Bitcoin Price預測模型

decomposition消除趨勢和季節性

decomposition是一種技術,在該技術中,該序列的季節性、趨勢成分被移除,然後將模型應用於殘差序列。

# Naive decomposition of our Time Series as explained above

from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(ts_log, model='multiplicative',freq = 7)

trend = decomposition.trend

seasonal = decomposition.seasonal

residual = decomposition.resid

plot.subplot(411)

plot.title('Obeserved = Trend + Seasonality + Residuals')

plot.plot(ts_log,label='Observed')

plot.legend(loc='best')

plot.subplot(412)

plot.plot(trend, label='Trend')

plot.legend(loc='best')

plot.subplot(413)

plot.plot(seasonal,label='Seasonality')

plot.legend(loc='best')

plot.subplot(414)

plot.plot(residual, label='Residuals')

plot.legend(loc='best')

plot.tight_layout()

plot.show()

機器學習:使用時間序列預測的Bitcoin Price預測模型

用差分去除趨勢和季節性

如果要使時間序列保持不變,則用之前的值減去當前值。正因如此,均值趨於穩定,從而增加了時間序列的平穩性。

ts_log_diff = ts_log - ts_log.shift()

plot.plot(ts_log_diff)

plot.show()

機器學習:使用時間序列預測的Bitcoin Price預測模型

ts_log_diff.dropna(inplace=True)

test_stationarity(ts_log_diff)

機器學習:使用時間序列預測的Bitcoin Price預測模型

由於我們的時間序列現在是平穩的,所以我們可以應用時間序列預測模型。

自迴歸模型

自迴歸模型是一個時序預測模型,其中當前值與過去值有關。

# follow lag

model = ARIMA(ts_log, order=(1,1,0))

results_ARIMA = model.fit(disp=-1)

plot.plot(ts_log_diff)

plot.plot(results_ARIMA.fittedvalues, color='red')

plot.title('RSS: %.7f'% sum((results_ARIMA.fittedvalues-ts_log_diff)**2))

plot.show()

機器學習:使用時間序列預測的Bitcoin Price預測模型

移動平均模型

在移動平均模型中,該系列依賴於過去的誤差項。

#follow error model = ARIMA(ts_log,order =(0,1,1))

results_MA = model.fit(disp = -1)

plot.plot(ts_log_diff)

plot.plot(results_MA.fittedvalues ,color ='red')

plot.title('RSS:%.7f '%sum((results_MA.fittedvalues-ts_log_diff)** 2))

plot.show()

機器學習:使用時間序列預測的Bitcoin Price預測模型

自迴歸整合移動平均模型

它是AR和MA模型的組合。它通過差分過程使時間序列本身固定。因此差分不需要為ARIMA模型明確進行

from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(ts_log,order =(8,1,0))

results_ARIMA = model.fit(disp = -1)

plot.plot(ts_log_diff)

plot.plot(results_ARIMA.fittedvalues ,color ='red')

plot.title('RSS:%.7f '%sum((results_ARIMA.fittedvalues-ts_log_diff)** 2))

plot.show()

機器學習:使用時間序列預測的Bitcoin Price預測模型

size = int(len(ts_log)-100)

train_arima, test_arima = ts_log[0:size], ts_log[size:len(ts_log)]

history = [x for x in train_arima]

predictions = list()

originals = list()

error_list = list()

print('Printing Predicted vs Expected Values...')

print('\n')

for t in range(len(test_arima)):

model = ARIMA(history, order=(2, 1, 0))

model_fit = model.fit(disp=-1)

output = model_fit.forecast()

pred_value = output[0]

original_value = test_arima[t]

history.append(original_value)

pred_value = dragon.exp(pred_value)

original_value = dragon.exp(original_value)

#Calculatig the serror

error = ((abs(pred_value - original_value)) / original_value) * 100

error_list.append(error)

print('predicted = %f, expected = %f, error = %f ' % (pred_value, original_value, error), '%')

predictions.append(float(pred_value))

originals.append(float(original_value))

print('\n Means Error in Predicting Test Case Articles : %f ' % (sum(error_list)/float(len(error_list))), '%')

plot.figure(figsize=(8, 6))

test_day = [t

for t in range(len(test_arima))]

labels={'Orginal','Predicted'}

plot.plot(test_day, predictions, color= 'green')

plot.plot(test_day, originals, color = 'orange')

plot.title('Expected Vs Predicted Views Forecasting')

plot.xlabel('Day')

plot.ylabel('Closing Price')

plot.legend(labels)

plot.show()

predicted = 2513.745189, expected = 2564.060000, error = 1.962310 %

predicted = 2566.007269, expected = 2601.640000, error = 1.369626 %

predicted = 2604.348629, expected = 2601.990000, error = 0.090647 %

predicted = 2605.558976, expected = 2608.560000, error = 0.115045 %

predicted = 2613.835793, expected = 2518.660000, error = 3.778827 %

predicted = 2523.203681, expected = 2571.340000, error = 1.872032 %

predicted = 2580.654927, expected = 2518.440000, error = 2.470376 %

predicted = 2521.053567, expected = 2372.560000, error = 6.258791 %

predicted = 2379.066829, expected = 2337.790000, error = 1.765635 %

predicted = 2348.468544, expected = 2398.840000, error = 2.099826 %

predicted = 2405.299995, expected = 2357.900000, error = 2.010263 %

predicted = 2359.650935, expected = 2233.340000, error = 5.655697 %

predicted = 2239.002236, expected = 1998.860000, error = 12.013960 %

predicted = 2006.206534, expected = 1929.820000, error = 3.958221 %

predicted = 1942.244784, expected = 2228.410000, error = 12.841677 %

predicted = 2238.150016, expected = 2318.880000, error = 3.481421 %

predicted = 2307.325788, expected = 2273.430000, error = 1.490954 %

predicted = 2272.890197, expected = 2817.600000, error = 19.332404 %

predicted = 2829.051277, expected = 2667.760000, error = 6.045944 %

predicted = 2646.110662, expected = 2810.120000, error = 5.836382 %

predicted = 2822.356853, expected = 2730.400000, error = 3.367889 %

predicted = 2730.087031, expected = 2754.860000, error = 0.899246 %

predicted = 2763.766195, expected = 2576.480000, error = 7.269072 %

predicted = 2580.946838, expected = 2529.450000, error = 2.035891 %

predicted = 2541.493507, expected = 2671.780000, error = 4.876393 %

predicted = 2679.029936, expected = 2809.010000, error = 4.627255 %

predicted = 2808.092238, expected = 2726.450000, error = 2.994452 %

predicted = 2726.150588, expected = 2757.180000, error = 1.125404 %

predicted = 2766.298163, expected = 2875.340000, error = 3.792311 %

Means Error in Predicting Test Case Articles : 3.593133 %

機器學習:使用時間序列預測的Bitcoin Price預測模型

因此,原始和預測時間序列繪製的平均誤差為3.59%。


分享到:


相關文章: