Stock Data Preparation#

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt

There are various methods to import stock data for a specific period using its symbol. One of the most popular tools is yfinance, a Python module that allows users to fetch historical stock price data conveniently via the Yahoo Finance API.

Description of Data#

  • Each stock has a symbol, which is unique to it and consists of up to five characters, including letters, ‘.’, and ‘-‘.”

Examples:

Company

Symbol

Alphabet

AALP

Amazon

AMZN

Apple

AAPL

Visa

V

Allstate

ALL

Tesla

TSLA

  • The symbol is used to import historical data.

  • history() method returns a dataframe with date as index and 7 columns:

    • Open : The intial price of the stock in the beginning of the day

    • High : The highest price of the stock during the day

    • Low : The lowest price of the stock during the day

    • Close : The final price of the stock at the end of the day

    • Volume: The number of stocks traded during the day

    • Dividens: This is the share of company earnings distributed among its investors

    • Stock Splits: It is subdividing each share of its stock into a fixed number of units.

  • The history() method by default returns data for the business days of the last month.

    • Some dates may be missing, representing days when the market is closed.

    • The index includes dates and times.

    • start and end parameters allow access to data within a specific range.

      • Dates should be in the format ‘YEAR-MONTH-DAY’, where the month is numerical.

Import Data#

  • By default, it returns only the last month of historical stock price data.

df = yf.Ticker('AAPL').history()
df.head().round(2)
Open High Low Close Volume Dividends Stock Splits
Date
2024-12-09 00:00:00-05:00 241.83 247.24 241.75 246.75 44649200 0.0 0.0
2024-12-10 00:00:00-05:00 246.89 248.21 245.34 247.77 36914800 0.0 0.0
2024-12-11 00:00:00-05:00 247.96 250.80 246.26 246.49 45205800 0.0 0.0
2024-12-12 00:00:00-05:00 246.89 248.74 245.68 247.96 32777500 0.0 0.0
2024-12-13 00:00:00-05:00 247.82 249.29 246.24 248.13 33155300 0.0 0.0
  • Data is available only for business days when the stock market is open.

# data of 22 days
df.shape
(21, 7)
  • Setting period='max' returns all available data for a stock.

df = yf.Ticker('AAPL').history(period='max')
df.head().round(2)
Open High Low Close Volume Dividends Stock Splits
Date
1980-12-12 00:00:00-05:00 0.10 0.10 0.10 0.10 469033600 0.0 0.0
1980-12-15 00:00:00-05:00 0.09 0.09 0.09 0.09 175884800 0.0 0.0
1980-12-16 00:00:00-05:00 0.09 0.09 0.09 0.09 105728000 0.0 0.0
1980-12-17 00:00:00-05:00 0.09 0.09 0.09 0.09 86441600 0.0 0.0
1980-12-18 00:00:00-05:00 0.09 0.09 0.09 0.09 73449600 0.0 0.0
  • Daily data between ‘1995-1-1’ and ‘2000-12-31’, in the form of ‘YEAR-MONTH-DAY’.

df = yf.Ticker('AAPL').history(start='1995-1-1', end='2000-12-31')
df.head().round(2)
Open High Low Close Volume Dividends Stock Splits
Date
1995-01-03 00:00:00-05:00 0.29 0.29 0.28 0.29 103868800 0.0 0.0
1995-01-04 00:00:00-05:00 0.29 0.30 0.29 0.29 158681600 0.0 0.0
1995-01-05 00:00:00-05:00 0.29 0.29 0.29 0.29 73640000 0.0 0.0
1995-01-06 00:00:00-05:00 0.31 0.32 0.31 0.31 1076622400 0.0 0.0
1995-01-09 00:00:00-05:00 0.31 0.31 0.31 0.31 274086400 0.0 0.0
  • Remove the last the two columns.

df = yf.Ticker('AAPL').history(start='1995-1-1', end='2000-12-31').iloc[:,:-2]
df.head().round(2)
Open High Low Close Volume
Date
1995-01-03 00:00:00-05:00 0.29 0.29 0.28 0.29 103868800
1995-01-04 00:00:00-05:00 0.29 0.30 0.29 0.29 158681600
1995-01-05 00:00:00-05:00 0.29 0.29 0.29 0.29 73640000
1995-01-06 00:00:00-05:00 0.31 0.32 0.31 0.31 1076622400
1995-01-09 00:00:00-05:00 0.31 0.31 0.31 0.31 274086400

Remove Time#

In this part, we will remove the time from the index dates.

# Reset the index and set the previous index as the Date column.
df.reset_index(inplace=True)
df.head().round(2)
Date Open High Low Close Volume
0 1995-01-03 00:00:00-05:00 0.29 0.29 0.28 0.29 103868800
1 1995-01-04 00:00:00-05:00 0.29 0.30 0.29 0.29 158681600
2 1995-01-05 00:00:00-05:00 0.29 0.29 0.29 0.29 73640000
3 1995-01-06 00:00:00-05:00 0.31 0.32 0.31 0.31 1076622400
4 1995-01-09 00:00:00-05:00 0.31 0.31 0.31 0.31 274086400
# Use date() method to access only the date part and assign it as the new values for the 'Date' column.
df['Date'] = [i.date() for i in df.Date]
df.head().round(2)
Date Open High Low Close Volume
0 1995-01-03 0.29 0.29 0.28 0.29 103868800
1 1995-01-04 0.29 0.30 0.29 0.29 158681600
2 1995-01-05 0.29 0.29 0.29 0.29 73640000
3 1995-01-06 0.31 0.32 0.31 0.31 1076622400
4 1995-01-09 0.31 0.31 0.31 0.31 274086400