Stock Data Preparation#
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
There are various methods to import stock data for a specific period using its symbol. One of the most popular tools is yfinance, a Python module that allows users to fetch historical stock price data conveniently via the Yahoo Finance API.
Description of Data#
Each stock has a symbol, which is unique to it and consists of up to five characters, including letters, ‘.’, and ‘-‘.”
Examples:
Company |
Symbol |
---|---|
Alphabet |
AALP |
Amazon |
AMZN |
Apple |
AAPL |
Visa |
V |
Allstate |
ALL |
Tesla |
TSLA |
The symbol is used to import historical data.
history()
method returns a dataframe with date as index and 7 columns:Open : The intial price of the stock in the beginning of the day
High : The highest price of the stock during the day
Low : The lowest price of the stock during the day
Close : The final price of the stock at the end of the day
Volume: The number of stocks traded during the day
Dividens: This is the share of company earnings distributed among its investors
Stock Splits: It is subdividing each share of its stock into a fixed number of units.
The
history()
method by default returns data for the business days of the last month.Some dates may be missing, representing days when the market is closed.
The index includes dates and times.
start and end parameters allow access to data within a specific range.
Dates should be in the format ‘YEAR-MONTH-DAY’, where the month is numerical.
Import Data#
By default, it returns only the last month of historical stock price data.
df = yf.Ticker('AAPL').history()
df.head().round(2)
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Date | |||||||
2024-12-09 00:00:00-05:00 | 241.83 | 247.24 | 241.75 | 246.75 | 44649200 | 0.0 | 0.0 |
2024-12-10 00:00:00-05:00 | 246.89 | 248.21 | 245.34 | 247.77 | 36914800 | 0.0 | 0.0 |
2024-12-11 00:00:00-05:00 | 247.96 | 250.80 | 246.26 | 246.49 | 45205800 | 0.0 | 0.0 |
2024-12-12 00:00:00-05:00 | 246.89 | 248.74 | 245.68 | 247.96 | 32777500 | 0.0 | 0.0 |
2024-12-13 00:00:00-05:00 | 247.82 | 249.29 | 246.24 | 248.13 | 33155300 | 0.0 | 0.0 |
Data is available only for business days when the stock market is open.
# data of 22 days
df.shape
(21, 7)
Setting
period='max'
returns all available data for a stock.
df = yf.Ticker('AAPL').history(period='max')
df.head().round(2)
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Date | |||||||
1980-12-12 00:00:00-05:00 | 0.10 | 0.10 | 0.10 | 0.10 | 469033600 | 0.0 | 0.0 |
1980-12-15 00:00:00-05:00 | 0.09 | 0.09 | 0.09 | 0.09 | 175884800 | 0.0 | 0.0 |
1980-12-16 00:00:00-05:00 | 0.09 | 0.09 | 0.09 | 0.09 | 105728000 | 0.0 | 0.0 |
1980-12-17 00:00:00-05:00 | 0.09 | 0.09 | 0.09 | 0.09 | 86441600 | 0.0 | 0.0 |
1980-12-18 00:00:00-05:00 | 0.09 | 0.09 | 0.09 | 0.09 | 73449600 | 0.0 | 0.0 |
Daily data between ‘1995-1-1’ and ‘2000-12-31’, in the form of ‘YEAR-MONTH-DAY’.
df = yf.Ticker('AAPL').history(start='1995-1-1', end='2000-12-31')
df.head().round(2)
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Date | |||||||
1995-01-03 00:00:00-05:00 | 0.29 | 0.29 | 0.28 | 0.29 | 103868800 | 0.0 | 0.0 |
1995-01-04 00:00:00-05:00 | 0.29 | 0.30 | 0.29 | 0.29 | 158681600 | 0.0 | 0.0 |
1995-01-05 00:00:00-05:00 | 0.29 | 0.29 | 0.29 | 0.29 | 73640000 | 0.0 | 0.0 |
1995-01-06 00:00:00-05:00 | 0.31 | 0.32 | 0.31 | 0.31 | 1076622400 | 0.0 | 0.0 |
1995-01-09 00:00:00-05:00 | 0.31 | 0.31 | 0.31 | 0.31 | 274086400 | 0.0 | 0.0 |
Remove the last the two columns.
df = yf.Ticker('AAPL').history(start='1995-1-1', end='2000-12-31').iloc[:,:-2]
df.head().round(2)
Open | High | Low | Close | Volume | |
---|---|---|---|---|---|
Date | |||||
1995-01-03 00:00:00-05:00 | 0.29 | 0.29 | 0.28 | 0.29 | 103868800 |
1995-01-04 00:00:00-05:00 | 0.29 | 0.30 | 0.29 | 0.29 | 158681600 |
1995-01-05 00:00:00-05:00 | 0.29 | 0.29 | 0.29 | 0.29 | 73640000 |
1995-01-06 00:00:00-05:00 | 0.31 | 0.32 | 0.31 | 0.31 | 1076622400 |
1995-01-09 00:00:00-05:00 | 0.31 | 0.31 | 0.31 | 0.31 | 274086400 |
Remove Time#
In this part, we will remove the time from the index dates.
# Reset the index and set the previous index as the Date column.
df.reset_index(inplace=True)
df.head().round(2)
Date | Open | High | Low | Close | Volume | |
---|---|---|---|---|---|---|
0 | 1995-01-03 00:00:00-05:00 | 0.29 | 0.29 | 0.28 | 0.29 | 103868800 |
1 | 1995-01-04 00:00:00-05:00 | 0.29 | 0.30 | 0.29 | 0.29 | 158681600 |
2 | 1995-01-05 00:00:00-05:00 | 0.29 | 0.29 | 0.29 | 0.29 | 73640000 |
3 | 1995-01-06 00:00:00-05:00 | 0.31 | 0.32 | 0.31 | 0.31 | 1076622400 |
4 | 1995-01-09 00:00:00-05:00 | 0.31 | 0.31 | 0.31 | 0.31 | 274086400 |
# Use date() method to access only the date part and assign it as the new values for the 'Date' column.
df['Date'] = [i.date() for i in df.Date]
df.head().round(2)
Date | Open | High | Low | Close | Volume | |
---|---|---|---|---|---|---|
0 | 1995-01-03 | 0.29 | 0.29 | 0.28 | 0.29 | 103868800 |
1 | 1995-01-04 | 0.29 | 0.30 | 0.29 | 0.29 | 158681600 |
2 | 1995-01-05 | 0.29 | 0.29 | 0.29 | 0.29 | 73640000 |
3 | 1995-01-06 | 0.31 | 0.32 | 0.31 | 0.31 | 1076622400 |
4 | 1995-01-09 | 0.31 | 0.31 | 0.31 | 0.31 | 274086400 |