Coding#

Section Title: Coding

Data Preparation#

  • We will use the Yahoo Finance API to import stock data, and we will also use the pandas and NumPy packages.

import yfinance as yf
import pandas as pd
import numpy as np
  • We will use the following constants throughout this notebook. Using dictionaries allows us to store key–value pairs within a single data structure.

STOCK_DICT = {'Apple': 'AAPL', 'Tesla': 'TSLA', 'Amazon': 'AMZN', 'Visa': 'V', 'Microsoft': 'MSFT'}
START = '2015-1-1'
END = '2020-12-31'
  • It is good practice to keep all the Close values of the stocks we are considering in a single DataFrame, as this makes it easier to access them.

df = pd.DataFrame()

for name, symbol in STOCK_DICT.items():
    df[name] = yf.Ticker(symbol).history(start=START, end=END).Close
    
df.head().round(2)

If the code above does not work due to a YFRateLimitError, you can load the data from the following URL using the pandas read_csv() method.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datasmp/datasets/refs/heads/main/close_stock_data_raw.csv',
                parse_dates = ['Date'])
df['Date'] = pd.to_datetime(df['Date'], utc=True)
df.set_index('Date', inplace=True)
df.head()
Apple Tesla Amazon Visa Microsoft
Date
2015-01-02 05:00:00+00:00 24.261047 14.620667 15.4260 61.462486 39.933056
2015-01-05 05:00:00+00:00 23.577578 14.006000 15.1095 60.105762 39.565842
2015-01-06 05:00:00+00:00 23.579796 14.085333 14.7645 59.718475 38.985107
2015-01-07 05:00:00+00:00 23.910429 14.063333 14.9210 60.518589 39.480442
2015-01-08 05:00:00+00:00 24.829128 14.041333 15.0230 61.330284 40.641880
  • To remove the time portion from the date values, we first reset the index (row labels). This converts the index, which contains the Date values, into a column. Then, we use .dt.date to extract only the date part from each value in that column.

df.reset_index(inplace=True)
df['Date'] = df.Date.dt.date
df.set_index('Date', inplace=True)
df.head().round(2)
Apple Tesla Amazon Visa Microsoft
Date
2015-01-02 24.26 14.62 15.43 61.46 39.93
2015-01-05 23.58 14.01 15.11 60.11 39.57
2015-01-06 23.58 14.09 14.76 59.72 38.99
2015-01-07 23.91 14.06 14.92 60.52 39.48
2015-01-08 24.83 14.04 15.02 61.33 40.64
  • The info() method in pandas provides basic information about a DataFrame, such as the number of entries, column names, non-null counts, and data types.

df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1510 entries, 2015-01-02 to 2020-12-30
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Apple      1510 non-null   float64
 1   Tesla      1510 non-null   float64
 2   Amazon     1510 non-null   float64
 3   Visa       1510 non-null   float64
 4   Microsoft  1510 non-null   float64
dtypes: float64(5)
memory usage: 70.8+ KB
  • The built-in len() function returns the number of rows in a DataFrame.

len(df)
1510
  • A DataFrame’s shape attribute returns the number of rows and columns as a tuple.

df.shape
(1510, 5)
  • The describe() method in pandas provides basic descriptive statistics for each column of a DataFrame. These include

    • number of values (count)

    • mean

    • standard deviation

    • minimum value

    • 25th percentile (25% of the values are less than or equal to this value)

    • 50th percentile (also called the median)

    • 75th percentile (75% of the values are less than or equal to this value)

    • maximum value

df.describe()
Apple Tesla Amazon Visa Microsoft
count 1510.000000 1510.000000 1510.000000 1510.000000 1510.000000
mean 45.501393 30.976507 68.798972 117.331868 92.857470
std 24.912472 37.139818 39.538892 45.824109 50.984855
min 20.624054 9.578000 14.347500 57.134926 34.501617
25% 27.004882 15.139666 36.388124 73.924759 48.993045
50% 38.962189 18.944000 59.735750 108.241325 79.382622
75% 51.278791 23.162333 91.450748 159.760967 127.795855
max 133.190170 231.666672 176.572495 211.000885 222.111893

Log Return#

The log return \(r_t\) is calculated as:

\(\displaystyle r_t = ln\left(\frac{P_t}{P_{t-1}}\right) = ln(P_t) - ln(P_{t-1})\)

where \(P_t\) is the current price, \(P_{t-1}\) is the previous price, and \(ln\) denotes the natural logarithm.

Why Use Log Returns?

  • Time Additivity

    • Log returns can be summed across time.

    • Example: The log return over a year is just the sum of monthly log returns. This is not true for simple (arithmetic) returns.

    • \(r_t + r_{t+1} = ln(P_t) - ln(P_{t-1}) + ln(P_{t+1}) - ln(P_{t}) = ln(P_{t+1}) - ln(P_{t-1})\)

  • Statistical Properties

    • Log returns often approximate a normal distribution better than simple returns, which is useful for many statistical and financial models

  • Symmetry

    • Percentage changes (simple returns) are asymmetric: a +10% gain and a −10% loss don’t cancel out.

    • If P is the initial value then the final value is \(P\times 1.1\times 0.9 = P\times 0.99\)

    • \(ln(1.1) + ln(0.9) = -0.01\)

    • The difference is in how returns combine mathematically.

    • With simple returns, you need to multiply growth factors: \((1+r_1)(1+r_2)\)

    • With log returns, you just add them: \(r_1 + r_2\)

    • Log returns are symmetric in relative changes, which makes them easier to analyze.

In the following code, the shift(n) method moves the values of a column down by n units, so that in each row the shifted values represent past values.

df_toy = pd.DataFrame([1,2,3,4], columns=['Initial'], index=['day1', 'day2', 'day3', 'day4' ])
df_toy['shift_1'] = df_toy.Initial.shift(1)
df_toy['shift_2'] = df_toy.Initial.shift(2)
df_toy
Initial shift_1 shift_2
day1 1 NaN NaN
day2 2 1.0 NaN
day3 3 2.0 1.0
day4 4 3.0 2.0
  • Now we will use the shift() method to calculate the log returns.

df_log = np.log(df/df.shift(1))
df_log.dropna(inplace=True)
df_log.head().round(3)
Apple Tesla Amazon Visa Microsoft
Date
2015-01-05 -0.029 -0.043 -0.021 -0.022 -0.009
2015-01-06 0.000 0.006 -0.023 -0.006 -0.015
2015-01-07 0.014 -0.002 0.011 0.013 0.013
2015-01-08 0.038 -0.002 0.007 0.013 0.029
2015-01-09 0.001 -0.019 -0.012 -0.015 -0.008
  • The base of np.log() is Euler’s number \(e\), which is a mathematical constant similar to \(\pi\) and approximately equal to 2.718.

np.log(10) 
2.302585092994046
np.e
2.718281828459045
np.log(np.e) 
1.0

Percentage Change#

If you prefer to use percentage changes instead of log returns, you can use the pandas pct_change() method.

df_pct = df.pct_change()
df_pct.dropna(inplace=True)
df_pct.head().round(3)
Apple Tesla Amazon Visa Microsoft
Date
2015-01-05 -0.028 -0.042 -0.021 -0.022 -0.009
2015-01-06 0.000 0.006 -0.023 -0.006 -0.015
2015-01-07 0.014 -0.002 0.011 0.013 0.013
2015-01-08 0.038 -0.002 0.007 0.013 0.029
2015-01-09 0.001 -0.019 -0.012 -0.015 -0.008

Lagged Data#

One way to predict future stock prices is by using a certain number of previous stock prices. To do this, we will prepare a DataFrame that consists of the closing values along with their lagged versions up to a certain window period.

  • The following function generates lagged values as new columns.

    • The parameter data is the DataFrame that contains values for various stocks.

    • The parameter name specifies the column name (the stock for which lagged data will be generated as a new DataFrame).

    • The parameter lag defines the number of lags, up to which lagged data will be generated as new columns.

def lag_func(data, name, lag):
    df_lag = pd.DataFrame(data[name])
    for i in range(1, lag+1):
        df_lag[f'lag_{i}'] = df_lag[name].shift(i)
        df_lag.dropna(inplace=True)
    return df_lag
df_log.Visa.head(10)
Date
2015-01-05   -0.022321
2015-01-06   -0.006464
2015-01-07    0.013309
2015-01-08    0.013323
2015-01-09   -0.014934
2015-01-12   -0.001959
2015-01-13    0.002918
2015-01-14   -0.020220
2015-01-15   -0.009554
2015-01-16    0.007164
Name: Visa, dtype: float64
lag_func(df_log, 'Visa', 2).head()
Visa lag_1 lag_2
Date
2015-01-08 0.013323 0.013309 -0.006464
2015-01-09 -0.014934 0.013323 0.013309
2015-01-12 -0.001959 -0.014934 0.013323
2015-01-13 0.002918 -0.001959 -0.014934
2015-01-14 -0.020220 0.002918 -0.001959
  • We can generate a dictionary with stock names as keys and DataFrames (containing the closing values and their lagged versions) as the corresponding values, allowing us to keep all the data in a single dictionary.

df_dict = {}
for name in STOCK_DICT.keys():
  df_dict[name] = lag_func(df_log, name, 10)
df_dict.keys()
dict_keys(['Apple', 'Tesla', 'Amazon', 'Visa', 'Microsoft'])
df_dict['Visa'].head()
Visa lag_1 lag_2 lag_3 lag_4 lag_5 lag_6 lag_7 lag_8 lag_9 lag_10
Date
2015-03-25 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022 0.018079 -0.001698
2015-03-26 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022 0.018079
2015-03-27 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022
2015-03-30 0.001829 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943
2015-03-31 -0.003815 0.001829 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944

Input and Output Data#

In this section, we will prepare the input and output data that will be used to build the Machine Learning models for a single stock (Visa). You can apply the same process to multiple stocks together by using a for loop.

Input#

  • The input data consists of lagged values, which represent past prices, and the output data consists of the closing values — the column labeled with the name of the stock.

df_visa = df_dict['Visa']
df_visa.head()
Visa lag_1 lag_2 lag_3 lag_4 lag_5 lag_6 lag_7 lag_8 lag_9 lag_10
Date
2015-03-25 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022 0.018079 -0.001698
2015-03-26 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022 0.018079
2015-03-27 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022
2015-03-30 0.001829 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943
2015-03-31 -0.003815 0.001829 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944
  • iloc[:, 1:] selects all rows and all columns starting from index 1 (the second column), which in this case is the lag_1 column.

  • This provides all the lagged values, representing the past stock prices, and will be used as the input data.

df_visa.iloc[:,1:].head()
lag_1 lag_2 lag_3 lag_4 lag_5 lag_6 lag_7 lag_8 lag_9 lag_10
Date
2015-03-25 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022 0.018079 -0.001698
2015-03-26 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022 0.018079
2015-03-27 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943 -0.017022
2015-03-30 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944 0.014943
2015-03-31 0.001829 -0.000762 -0.002132 -0.020629 0.000298 -0.004908 0.008941 -0.001608 0.011914 -0.016944

The values attribute removes the labels (row and column indices) and returns the underlying data as a NumPy array, similar to a matrix.

X = df_visa.iloc[:,1:].values
type(X), X.shape
(numpy.ndarray, (1454, 10))

Output#

Output data in Machine Learning can generally take two different forms: continuous or categorical.

  • Continuous data means that output values can take any value within a range, such as a price, a percentage change, or a log return. These values are numerical and can be measured on a continuous scale.

  • Categorical data, on the other hand, represents discrete classes or labels, such as increasing vs. decreasing, or buy, hold, sell. These values do not represent magnitudes but categories.

  • When the output data is continuous, we use regressor algorithms, and the task is called regression. When the output data is categorical, we use classifier algorithms, and the task is called classification.

  • Therefore, it is essential to clearly identify the type of output data before building a model, so that we can choose the most appropriate Machine Learning algorithm depending on whether the task is regression or classification.

Regression#

Log return values are continuous data. Therefore, if the output variable in a model is log returns, the output is continuous, and regression algorithms should be used.

yR = df_visa['Visa'].values
yR.shape
(1454,)
yR[:5]
array([-0.02062898, -0.00213218, -0.0007624 ,  0.00182934, -0.0038147 ])

Classification#

If we want to perform classification and predict the behavior of the stock price, whether it increases or decreases, we need to convert the output data into labels (increasing and decreasing). We will encode these two labels numerically as 0 (decreasing or flat) and 1 (increasing).

yC = np.where(yR > 0, 1, 0)
yC
array([0, 0, 0, ..., 1, 1, 1])

NumPy’s bincount() method counts the number of occurrences of each non-negative integer in an array. For example, if the output array contains only 0s and 1s, np.bincount() will return the count of 0s as the first element and the count of 1s as the second element.

# example
np.bincount([0,0,1,1,1,2])
array([2, 3, 1])
  • The number of 0s and 1s in yC.

np.bincount(yC)
array([642, 812])
  • You can also use the Counter() class from the collections module to count the number of occurrences of 0 and 1.

import collections
collections.Counter(yC)
Counter({1: 812, 0: 642})

Split Data#

In the following code, we will split the entire dataset into three different subsets:

  • Training Data (90%): This portion of the data is used to train and build the Machine Learning models.

  • Validation Data (5%): Defined by valid_ratio, this subset is used to evaluate and compare different models that were built using the training data. It helps in selecting the best-performing model and tuning hyperparameters.

  • Test Data (5%): The remaining portion of the data is used to assess the performance of the final model chosen based on validation results. This set is never used during training or model selection, only for the final performance check.

N = len(df_visa) # total number of rows

tr = 0.90 # train ratio
vr = (1-train_ratio)/2 # validation ratio

ts = int(N*tr) # training size
vs = int(N*vr) # validation size

X_train, yR_train, yC_train = X[:ts], yR[:ts], yC[:ts]
X_valid, yR_valid, yC_valid = X[ts:ts+vs], yR[ts:ts+vs], yC[ts:ts+vs]
X_test , yR_test , yC_test  = X[ts+vs:], yR[ts+vs:], yC[ts+vs:]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[30], line 4
      1 N = len(df_visa) # total number of rows
      3 tr = 0.90 # train ratio
----> 4 vr = (1-train_ratio)/2 # validation ratio
      6 ts = int(N*tr) # training size
      7 vs = int(N*vr) # validation size

NameError: name 'train_ratio' is not defined
X_train.shape, yR_train.shape, yC_train.shape
((1308, 10), (1308,), (1308,))
X_valid.shape, yR_valid.shape, yC_valid.shape
((72, 10), (72,), (72,))
X_test.shape, yR_test.shape, yC_test.shape
((74, 10), (74,), (74,))

Machine Learning#

In this section, we will use different Machine Learning algorithms to make predictions and evaluate their performance on the validation set for comparison.

For the models that we import from scikit-learn, including K-Nearest Neighbors (KNN), Decision Tree, Random Forest, and Multi-layer Perceptron (MLP) , the fit() method performs the training step by using the input and output data from the training dataset.

  • Once trained, the predict() method generates predictions for the given input values.

  • For regression models, we use the Root Mean Squared Error (RMSE) to measure the difference between the predicted values and the actual values.

  • For classification models, the score() method combines the prediction step and evaluation, returning the accuracy score, which is the proportion of correctly classified samples (in this case, trading days).

This approach allows us to fairly compare different models by using the appropriate evaluation metric for the type of problem (regression or classification).

For more information on Machine Learning algorithms, please check the following online book: Introduction to Machine Learning.

KNN#

For more information on KNN, please see KNN chapter.

from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.metrics import root_mean_squared_error as rmse
knnR = KNeighborsRegressor()
knnR.fit(X_train, yR_train)
pred_valid = knnR.predict(X_valid)
rmse(pred_valid, yR_valid)
0.01686889097515809
knnC = KNeighborsClassifier()
knnC.fit(X_train, yC_train)
knnC.score(X_valid, yC_valid)
0.5277777777777778

Decision Tree#

For more information on Decision Tree, please see Decision Tree chapter.

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
dtR = DecisionTreeRegressor(random_state=0)
dtR.fit(X_train, yR_train)
pred_valid = dtR.predict(X_valid)
rmse(pred_valid, yR_valid)
0.023515643972095466
dtC = DecisionTreeClassifier()
dtC.fit(X_train, yC_train)
dtC.score(X_valid, yC_valid)
0.5416666666666666

Random Forest#

For more information on Random Forest, please see Random Forest chapter.

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
rfR = RandomForestRegressor(random_state=0)
rfR.fit(X_train, yR_train)
pred_valid = rfR.predict(X_valid)
rmse(pred_valid, yR_valid)
0.017484014073654453
rfC = RandomForestClassifier()
rfC.fit(X_train, yC_train)
rfC.score(X_valid, yC_valid)
0.5416666666666666

MLP#

For more information on Multi-layer Perceptron, please see MLP chapter.

from sklearn.neural_network import MLPClassifier, MLPRegressor
mlpR = MLPRegressor(random_state=0)
mlpR.fit(X_train, yR_train)
pred_valid = mlpR.predict(X_valid)
rmse(pred_valid, yR_valid)
0.015456051693813764
mlpC = MLPClassifier(random_state=0, max_iter=500)
mlpC.fit(X_train, yC_train)
mlpC.score(X_valid, yC_valid)
0.7083333333333334

Test Data#

Among the models we’ve evaluated so far, the MLP shows the best performance on the validation data. Now, let’s check the performance of this best model (MLP) on the test set. Since we will no longer use the separate training and validation sets, let’s retrain the MLP on the combined training+validation data and then evaluate it on the test data.

X_train.shape, X_valid.shape
((1308, 10), (72, 10))
  • NumPy’s vstack() method stacks arrays vertically, meaning it places one array below the other.

X_train_valid = np.vstack([X_train, X_valid])
X_train_valid.shape
(1380, 10)
yR_train.shape, yR_valid.shape
((1308,), (72,))
  • NumPy’s hstack() method stacks arrays horizontally, meaning it places one array to the right of the other.

yR_train_valid = np.hstack([yR_train, yR_valid])
yR_train_valid.shape
(1380,)
yC_train_valid = np.hstack([yC_train, yC_valid])
yC_train_valid.shape
(1380,)
mlpR = MLPRegressor(random_state=0)
mlpR.fit(X_train_valid, yR_train_valid)
pred_test = mlpR.predict(X_test)
rmse(pred_test, yR_test)
0.016404914243523118
mlpC = MLPClassifier(random_state=0, max_iter=1000)
mlpC.fit(X_train_valid, yC_train_valid)
mlpC.score(X_test, yC_test)
0.5540540540540541

Keras#

Keras is a high-level neural network library that runs on top of Theano or TensorFlow, offering a user-friendly API similar to scikit-learn for constructing neural networks in Python.

from tensorflow import keras

Feedforward (FNN / MLP)#

In feedforward neural network structures, data flows from the input to the output by passing through the neurons without returning to a previous neuron.

Regression#

The following model consists of an input layer with 10 neurons, two hidden layers with 100 and 200 neurons respectively, and an output layer with one neuron.

model = keras.models.Sequential([
    keras.layers.Input((10,)),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(200, activation='relu'),
    keras.layers.Dense(1)])
model.summary()
Model: "sequential_9"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_27 (Dense)                │ (None, 100)            │         1,100 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_28 (Dense)                │ (None, 200)            │        20,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_29 (Dense)                │ (None, 1)              │           201 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 21,501 (83.99 KB)
 Trainable params: 21,501 (83.99 KB)
 Non-trainable params: 0 (0.00 B)

In Keras, the compile() step is where you configure the model for training.

  • The optimizer, metrics, and loss function can all be specified.

In the following code, we set only the loss function to mean squared error.

model.compile(loss = 'mse')

For scikit-learn algorithms, the fit() method is used to train the model. Typically, the model is trained only on the training set, while the validation set is kept separate and not used directly during fitting.

  • The validation set plays a key role in the training process:

  • It is used to evaluate the model’s performance on unseen data while the training is still in progress.

  • It helps detect overfitting, since performance on training data may improve even while performance on validation data deteriorates.

  • It is commonly used for hyperparameter tuning, either manually or through techniques like GridSearchCV or RandomizedSearchCV, where cross-validation splits serve as validation sets.

In contrast, frameworks like Keras allow you to pass a validation set directly in the fit() method (e.g., validation_data=(x_val, y_val)), so that performance on the validation set is reported at the end of each training epoch.

model.fit(X_train, yR_train, validation_data=(X_valid, yR_valid));
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 2.5526e-04 - val_loss: 2.7879e-04
model.predict(X_test[:5])
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step
array([[-0.00121496],
       [-0.00020361],
       [-0.00067773],
       [ 0.0009219 ],
       [ 0.00194272]], dtype=float32)
yR_test_predict = model.predict(X_test)
rmse(yR_test_predict , yR_test)
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 976us/step
0.016397975497796584

Classification#

Sigmoid#

In the binary case, which means there are only two classes in the output, the activation function can be chosen as the sigmoid function, which returns a single value representing the probability of being in class 1.

model = keras.models.Sequential([
    keras.layers.Input((10,)),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(200, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')]) 
model.summary()
Model: "sequential_7"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_21 (Dense)                │ (None, 100)            │         1,100 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_22 (Dense)                │ (None, 200)            │        20,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_23 (Dense)                │ (None, 1)              │           201 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 21,501 (83.99 KB)
 Trainable params: 21,501 (83.99 KB)
 Non-trainable params: 0 (0.00 B)
model.compile(loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, yC_train, validation_data=(X_valid, yC_valid));
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.5386 - loss: 0.6911 - val_accuracy: 0.5972 - val_loss: 0.6898
model.predict(X_test[:5])
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 27ms/step
array([[0.55796814],
       [0.5584805 ],
       [0.5597468 ],
       [0.55970466],
       [0.56186867]], dtype=float32)
yC_test_pred = np.where(model.predict(X_test)>0.5, 1, 0)
yC_test_pred[:5]
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step 
array([[1],
       [1],
       [1],
       [1],
       [1]])
Softmax#

The softmax activation function is used for multi-class classification tasks and returns the probability of each class.

model = keras.models.Sequential([
    keras.layers.Input((10,)),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(200, activation='relu'),
    keras.layers.Dense(2, activation='softmax')]) # binary case: only two classes
model.summary()
Model: "sequential_8"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense_24 (Dense)                │ (None, 100)            │         1,100 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_25 (Dense)                │ (None, 200)            │        20,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_26 (Dense)                │ (None, 2)              │           402 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 21,702 (84.77 KB)
 Trainable params: 21,702 (84.77 KB)
 Non-trainable params: 0 (0.00 B)
model.compile(loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, yC_train, validation_data=(X_valid, yC_valid));
41/41 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.5283 - loss: 0.6912 - val_accuracy: 0.5972 - val_loss: 0.6914
model.predict(X_test[:5])
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step
array([[0.43344268, 0.56655735],
       [0.4247901 , 0.5752099 ],
       [0.42437348, 0.57562655],
       [0.42359856, 0.5764014 ],
       [0.4213894 , 0.57861066]], dtype=float32)
yC_test_pred = [np.argmax(i) for i in model.predict(X_test)]
yC_test_pred[:5]
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 874us/step
[1, 1, 1, 1, 1]

Recurrent Neural Network (RNN)#

Regression#

model = keras.models.Sequential([
    keras.layers.Input((None,1)),
    keras.layers.SimpleRNN(100, return_sequences=True),
    keras.layers.SimpleRNN(200),
    keras.layers.Dense(1)])
model.summary()
Model: "sequential_12"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ simple_rnn_4 (SimpleRNN)        │ (None, None, 100)      │        10,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ simple_rnn_5 (SimpleRNN)        │ (None, 200)            │        60,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_32 (Dense)                │ (None, 1)              │           201 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 70,601 (275.79 KB)
 Trainable params: 70,601 (275.79 KB)
 Non-trainable params: 0 (0.00 B)
model.compile(loss = 'mse')
model.fit(X_train, yR_train, validation_data=(X_valid, yR_valid));
41/41 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - loss: 0.0403 - val_loss: 2.8208e-04
model.predict(X_test[:5])
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
array([[0.00105258],
       [0.00105076],
       [0.00093284],
       [0.00070539],
       [0.00073511]], dtype=float32)
yR_test_predict = model.predict(X_test)
rmse(yR_test_predict , yR_test)
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step 
0.01601122283427286

Long Short-Term Memory (LSTM)#

Regression#

model = keras.models.Sequential([
    keras.layers.Input((None,1)),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.LSTM(200),
    keras.layers.Dense(1)])
model.summary()
Model: "sequential_16"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_12 (LSTM)                  │ (None, None, 100)      │        40,800 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_13 (LSTM)                  │ (None, 200)            │       240,800 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_36 (Dense)                │ (None, 1)              │           201 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 281,801 (1.07 MB)
 Trainable params: 281,801 (1.07 MB)
 Non-trainable params: 0 (0.00 B)
model.compile(loss = 'mse')
model.fit(X_train, yR_train, validation_data=(X_valid, yR_valid));
41/41 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - loss: 4.6015e-04 - val_loss: 3.3226e-04
model.predict(X_test[:5])
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 93ms/step
array([[0.00542205],
       [0.00540088],
       [0.00542711],
       [0.00555244],
       [0.00558671]], dtype=float32)
yR_test_predict = model.predict(X_test)
rmse(yR_test_predict , yR_test)
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step  
0.01676784618012104

Regression Model Construction#

, batch_size=16, epochs=2
('batch_size=16,', 'epochs=2')
def build_model(n_hiddens=2, n_neurons=100, input_shape=1):
  model = keras.models.Sequential()
  model.add(keras.layers.InputLayer(shape=[None, input_shape]))

  for layer in range(n_hiddens-1):
    model.add(keras.layers.LSTM(n_neurons, return_sequences=True))

  model.add(keras.layers.LSTM(n_neurons))
  model.add(keras.layers.Dense(1))
  model.compile(loss='mse')
  return model
build_model(n_hiddens=5, n_neurons=1).summary()
Model: "sequential_17"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_14 (LSTM)                  │ (None, None, 1)        │            12 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_15 (LSTM)                  │ (None, None, 1)        │            12 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_16 (LSTM)                  │ (None, None, 1)        │            12 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_17 (LSTM)                  │ (None, None, 1)        │            12 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_18 (LSTM)                  │ (None, 1)              │            12 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_37 (Dense)                │ (None, 1)              │             2 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 62 (248.00 B)
 Trainable params: 62 (248.00 B)
 Non-trainable params: 0 (0.00 B)
from scikeras.wrappers import KerasRegressor
lstm_keras_reg = KerasRegressor(build_model, n_hiddens=2, n_neurons=100)
lstm_keras_reg.fit(X_train, yR_train, validation_data=(X_valid, yR_valid), batch_size=16, epochs=2)

yR_test_predict = model.predict(X_test)
rmse(yR_test_predict , yR_test)
Epoch 1/2
82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 4.0597e-04 - val_loss: 3.9824e-04
Epoch 2/2
82/82 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 3.4419e-04 - val_loss: 2.7584e-04
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
0.01676784618012104

RandomizedSearchCV#

from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
"n_hiddens": [1, 2, 3],
"n_neurons": [50, 100, 150],
}

rnd_search_cv = RandomizedSearchCV(lstm_keras_reg, param_distribs, n_iter=5, cv=3)
rnd_search_cv.fit(X_train, yR_train, validation_data=(X_valid, yR_valid))
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 9ms/step - loss: 3.9042e-04 - val_loss: 2.8660e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 9ms/step - loss: 4.8826e-04 - val_loss: 3.3580e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 2.3440e-04 - val_loss: 6.1862e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - loss: 4.9173e-04 - val_loss: 2.8557e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - loss: 5.8294e-04 - val_loss: 3.8028e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - loss: 3.5741e-04 - val_loss: 3.7224e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 4.0899e-04 - val_loss: 3.9309e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 4.6213e-04 - val_loss: 3.1873e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 2.9718e-04 - val_loss: 2.8406e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - loss: 5.5350e-04 - val_loss: 3.9404e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - loss: 6.0026e-04 - val_loss: 5.7858e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - loss: 3.8712e-04 - val_loss: 8.2294e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 9ms/step - loss: 4.6308e-04 - val_loss: 4.4643e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - loss: 5.5476e-04 - val_loss: 2.8565e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
28/28 ━━━━━━━━━━━━━━━━━━━━ 1s 9ms/step - loss: 2.9283e-04 - val_loss: 3.8419e-04
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
41/41 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - loss: 4.7282e-04 - val_loss: 4.5706e-04
RandomizedSearchCV(cv=3,
                   estimator=KerasRegressor(model=<function build_model at 0x2b50ba3e0>, n_hiddens=2, n_neurons=100),
                   n_iter=5,
                   param_distributions={'n_hiddens': [1, 2, 3],
                                        'n_neurons': [50, 100, 150]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
rnd_search_cv.best_params_
{'n_neurons': 100, 'n_hiddens': 2}