Question-2: Extended Encoding#

import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statistics
from sklearn.metrics import accuracy_score

Title#

Extending Candlestick Encoding for Improved Stock Price Prediction

Abstract#

Stock price predictability has traditionally been dismissed by the Efficient Market and Random Walk hypotheses, which claim that stock prices are inherently random. However, recent advancements in artificial intelligence and computational power are challenging this perspective, with emerging studies indicating that stock market behavior may indeed possess a level of predictabily.

This project explores the predictive potential of candlestick patterns, which visually represent stock price movements through four key values: high, close, open, and low. Traditional candlestick encoding comprises twelve distinct codes, generated by comparing these values. We propose an extension of this encoding by categorizing the relative size of the difference between open and close values as small, medium, or large, based on historical data. This refined encoding aims to capture additional nuances in stock price movement patterns, potentially enhancing the accuracy of predictive models.

Data#

def get_training_test_data(stock='AMZN', start='2019-1-1', end='2021-1-31', training_ratio=0.96):
    df = yf.Ticker(stock).history(start=start, end=end)
    df = df.iloc[:,:-3]
    df.reset_index(inplace=True)
    df['Date'] = [i.date() for i in df.Date]
    df['fcc'] = [np.sign(df.Close.loc[i+1]-df.Close.loc[i]) for i in range(len(df)-1)]+[np.nan]
    training_length = int(len(df)*training_ratio)
    training_data = df.iloc[:training_length,:] 
    test_data = df.iloc[training_length:,:]
    test_data.reset_index(inplace=True, drop=True)
    return (training_data, test_data)
df_train, df_test = get_training_test_data()
df_train.shape, df_test.shape
((503, 6), (21, 6))

Extended Encoding#

Long White (Green) Candlestick

  • \(\displaystyle C_t - O_t > \frac{H_t-L_t}{2}\)

  • \(\displaystyle C_t - O_t > \frac{1}{10}\sum_{i=1}^{10}|C_{t-i}-O_{t-i}|\)

  • \(\displaystyle C_t - O_t > \frac{H_t+L_t}{2}\times0.03\)

Short White (Green) Candlestick

  • \(\displaystyle C_t - O_t < \frac{H_t-L_t}{2}\)

  • \(\displaystyle C_t - O_t < \frac{1}{10}\sum_{i=1}^{10}|C_{t-i}-O_{t-i}|\)

  • \(\displaystyle C_t - O_t < \frac{H_t+L_t}{2}\times0.03\)