TimeSeries with Python - 차분 & 차분을 이용한 예측값 구하기

이번 포스팅에서는 데이터의 일차 차분 및 이차 차분을 구하는 법에 대해서 알아보고, 그것들을 이용한 데이터 예측값을 어떻게 구하는지 살펴본다. 차분의 중요성은 Non-Stationary Data를 Stationary Data로 바꾸는 방법중의 하나이며, 특히나 데이터의 Trend가 있을 경우 많이 사용하는 방법 중 하나이다

차분을 왜 사용하는지에 대한 내용은 ARIMA 모델을 다루면서 자세히 설명하고 지금은 포스팅의 제목의 맞게 차분과 차분을 이용한 예측값을 구하는 방법을 포스팅 하겠다.

df = pd.read_csv('path', index_col = 0, parse_dates=True)
df.head()

df['b'].plot(figsize=(12,5),ylim=[0,100], title='Non-Stationary Data').autoscale(axis='x', tight=True)

#1차 차분

df['db'] = df['b']-df['b'].shift(1) # 판다스의 shift메소드를 이용해 차분하는 방법
df['db'] = df['b'].diff() # 판다스의 diff 메소드를 이용해 차분하는 방법

df['db'].plot(title='First Order of Non Stationary Data').autoscale(axis='x', tight=True)


idx = pd.date_range('1960-01-01', periods=5, freq='MS') #1950-01-01ㅂ퉈 1950-06-01까지 5개의 월 단위 기준 Index를 만든다

z = pd.DataFrame([7,-2,5,-1,12], index=idx, columns=['Forecast of DIff'])

z['Forecast'] = df['b'].iloc[-1] + z['Forecast of DIff'].cumsum()

df['b'].plot(title='First Order of Non Stationary Data').autoscale(axis='x', tight=True)
z['Forecast'].plot() #figure3


df['d1c'] = df['c'].diff() # 데이터의 1차 차분

df['d1c'].plot(title='First Order Diff').autoscale(axis='x', tight=True) 

df['d2c'] = df['c'].diff().diff()

df['d2c'].plot(title='Second Order Diff').autoscale(axis='x', tight=True)

# 1차 미분때와 마찬가지로 다시 미래의 인덱스 및 차분값을 만든 후 현재 끝시점의 데이터와 일차미분값을 이용해 구함

idx = pd.date_range('1960-01-01', periods=5, freq='MS')

z = pd.DataFrame([7,-2,5,-1,12], index=idx, columns=['Forecast of Diff'])

forecast = []

v2, v1 = df['c'].iloc[-2:] # 끝 시점의 두개의 데이터

for i in z['Forecast of Diff']:
    newval = i + 2*v1 - v2
    forecast.append(newval)
    v2, v1 = v1, newval

z['Forecast'] = forecast

# for iteration을 이용해 2차 차분을 이용한 예측값을 만들 수 있지만, 좀 더 1차 차분에서 했던것과 같은 직관적인 방법으로도 가능
# 우선 1차 차분 값의 2차 차분의 누적합을 구해서 더해준다
# 즉 2차 차분의 cumsum을 이용해 1차 차분의 예측값을 만들고, 그 값을 이용해 1차 차분의 예측값을 원래 데이터의 적용한다

z['firstdiff'] = (df['c'].iloc[-1]-df['c'].iloc[-2]) + z['Forecast of Diff'].cumsum()

z['Forecast'] = df['c'].iloc[-1]+z['firstdiff'].cumsum()

df['c'].plot(figsize=(12,5), title='Forecast').autoscale(axis='x', tight=True)
z['Forecast'].plot()

'Python > Time Series with Python' 카테고리의 다른 글

TimeSeries with Python _ 예측값 평가 및 Stationary & Non_Stationary Process (0)	2020.01.01
TimeSeries with Python _ Holt-Winters _ Prediction(Foresasting) _ Python (0)	2019.12.29
TimeSeries with Python _ EWMA & Holt-Winters _ Python (0)	2019.12.28
TimeSeries with Python _ EWMA & Holt - Winters Methods 이론 (0)	2019.12.28
TimeSeries with Python _ ETS Model (0)	2019.12.25

Designing my life

TimeSeries with Python - 차분 & 차분을 이용한 예측값 구하기

'Python > Time Series with Python' 카테고리의 다른 글

티스토리툴바

TimeSeries with Python - 차분 & 차분을 이용한 예측값 구하기

'Python > Time Series with Python' 카테고리의 다른 글

'Python/Time Series with Python' Related Articles

티스토리툴바