Neural Networks#

Neural Networks (NN) are a subset of Machine Learning (ML) that function similarly to the human brain’s neural networks.

  • At the core of this structure is the perceptron, a mathematical representation of a biological neuron.

  • Similar to the cerebral cortex, a neural network can have multiple layers of interconnected perceptrons.

  • Input values, or underlying data, pass through these “hidden layers” until they converge at the output layer.

  • The output layer provides the final prediction, which could be a single node if the model outputs a number or multiple nodes in the case of a multiclass classification problem.

  • The hidden layers in a NN perform transformations on the data to discern its relationship with the target variable.

  • Each node in the network is assigned a weight, which multiplies the input value.

  • As data passes through several layers, the network effectively transforms the data into something meaningful.

  • The process of determining the optimal weights for these nodes is typically achieved using an algorithm called Backpropagation.

History: The development of this field began around 1940, followed by a pause in progress around 1960.

  • It resumed in the 1980s but encountered another break in the 1990s.

  • A significant resurgence occurred in the 2010s, driven by substantial improvements in

    • data availability

    • computational power (especially with the advent of GPU cards)

    • advancements in theoretical frameworks.

Application: Some notable applications of neural networks include

  • autonomous vehicles, which rely on these models for navigation and decision-making

  • real-time translation systems that enable instant language conversion

  • Google’s Deep Learning-based AlphaGo, which famously defeated the world’s top Go player.

Examples: There are many different neural network (NN) structures that have been developed, some of which are highly complex and designed to handle intricate datasets such as images and language translation. Below are a few examples:

Perceptron#

A perceptron is an algorithm used for supervised learning of binary classifiers.

  • It is essentially a single-layer neural network (an artificial neuron) that performs computations to identify features or extract business intelligence from input data.

The perceptron algorithm learns the weights for the input signals to draw a linear decision boundary. It consists of four main components:

  • Input values

  • Weights and bias

  • Net sum

  • Activation function

How does a perceptron work?

  • The process begins by multiplying each input value by its corresponding weight.

  • These multiplied values are then summed to create the weighted sum.

  • The weighted sum is passed through the activation function, which produces the perceptron’s output.

  • The activation function is crucial as it maps the output to the desired range of values.

  • The weight of an input reflects the strength or influence of that particular node in the network.

Multilayer Perceptron (MLP)#

  • A perceptron is always a feedforward network, meaning that all connections flow in the direction of the output.

  • In contrast, some neural networks can have loops, and these are known as recurrent networks. Recurrent networks are generally more challenging to train compared to feedforward networks.

  • An artificial neural network that consists of an input layer, an output layer, and two or more hidden layers of perceptrons is called a Multilayer Perceptron (MLP).

  • These hidden layers contain trainable weights and are crucial for capturing complex patterns in the data.

  • Lower Layers: Layers that are closer to the input layer.

  • Upper Layers: Layers that are closer to the output layer.

  • Input Layer: Receives the data as input to build the neural network.

  • Hidden Layers: Perform complex operations and feature extraction.

    • Lower-Level Hidden Layers: Detect low-level structures, such as line segments and orientations.

    • Middle-Level Hidden Layers: Capture intermediate-level structures, like rectangles and circles.

    • Upper-Level Hidden Layers: Recognize complex structures, such as faces.

  • Output Layer: Generates the predicted value(s).

Activation Functions#

  • Activation functions are crucial for introducing non-linearity into neural networks, allowing them to model complex relationships.

  • They also convert the network’s output into a desired range, facilitating various tasks and interpretations.

  • The activation function of perceptron applies a step rule that converts the numerical output into either +1 or -1, determining whether the output of the weighting function is greater than zero.

  • The derivative of the step function is zero, which limits its usefulness in some applications.

  • The derivative of the ReLU (Rectified Linear Unit) function, however, is easy and fast to compute, making ReLU the default activation function for hidden layers in many neural networks.

  • Some of the most commonly used activation functions include:

sklearn Perceptron#

The sklearn implementation of the Perceptron is designed exclusively for binary classification tasks.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

X = load_breast_cancer().data
y = load_breast_cancer().target

X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)
from sklearn.linear_model import Perceptron
clf = Perceptron(random_state=0)
clf.fit(X_train, y_train)
clf.score(X_train, y_train), clf.score(X_test, y_test)
(0.8497652582159625, 0.8531468531468531)
clf.intercept_
array([253.])
clf.coef_.shape
(1, 30)
# actual output
y_test[:5]
array([0, 1, 1, 1, 1])
# predicted classes
clf.predict(X_test[:5])
array([1, 1, 1, 1, 1])
clf.score(X_train, y_train)
0.8497652582159625
clf.score(X_test, y_test)
0.8531468531468531

sklearn MLP Classifier#

from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(random_state=0, hidden_layer_sizes=[10,20], activation='relu', max_iter=1000, verbose=True)
clf.fit(X_train, y_train)
clf.score(X_train, y_train), clf.score(X_test, y_test)
Iteration 1, loss = 16.48366333
Iteration 2, loss = 12.81163261
Iteration 3, loss = 9.38188348
Iteration 4, loss = 6.15284159
Iteration 5, loss = 3.06556933
Iteration 6, loss = 1.79209303
Iteration 7, loss = 3.17472712
Iteration 8, loss = 3.08159243
Iteration 9, loss = 1.92127815
Iteration 10, loss = 1.13278810
Iteration 11, loss = 1.46585057
Iteration 12, loss = 1.39154017
Iteration 13, loss = 0.92836057
Iteration 14, loss = 0.78011923
Iteration 15, loss = 0.83217703
Iteration 16, loss = 0.61431221
Iteration 17, loss = 0.56937143
Iteration 18, loss = 0.56185809
Iteration 19, loss = 0.47847936
Iteration 20, loss = 0.44624358
Iteration 21, loss = 0.44303375
Iteration 22, loss = 0.41074560
Iteration 23, loss = 0.39071021
Iteration 24, loss = 0.39527286
Iteration 25, loss = 0.37482045
Iteration 26, loss = 0.36508208
Iteration 27, loss = 0.36345044
Iteration 28, loss = 0.34420751
Iteration 29, loss = 0.35248542
Iteration 30, loss = 0.35820652
Iteration 31, loss = 0.33047450
Iteration 32, loss = 0.32809274
Iteration 33, loss = 0.33135073
Iteration 34, loss = 0.31658450
Iteration 35, loss = 0.30966815
Iteration 36, loss = 0.31942464
Iteration 37, loss = 0.30854938
Iteration 38, loss = 0.29585009
Iteration 39, loss = 0.30295961
Iteration 40, loss = 0.29983246
Iteration 41, loss = 0.28709714
Iteration 42, loss = 0.29224021
Iteration 43, loss = 0.29280316
Iteration 44, loss = 0.28575912
Iteration 45, loss = 0.27984300
Iteration 46, loss = 0.27572029
Iteration 47, loss = 0.27650301
Iteration 48, loss = 0.27705311
Iteration 49, loss = 0.27238059
Iteration 50, loss = 0.26795715
Iteration 51, loss = 0.26710265
Iteration 52, loss = 0.26566778
Iteration 53, loss = 0.26244564
Iteration 54, loss = 0.25973070
Iteration 55, loss = 0.25797436
Iteration 56, loss = 0.25627265
Iteration 57, loss = 0.26035768
Iteration 58, loss = 0.25627474
Iteration 59, loss = 0.25119117
Iteration 60, loss = 0.26167088
Iteration 61, loss = 0.25440192
Iteration 62, loss = 0.24775587
Iteration 63, loss = 0.24523190
Iteration 64, loss = 0.24275218
Iteration 65, loss = 0.24167241
Iteration 66, loss = 0.24242831
Iteration 67, loss = 0.23827775
Iteration 68, loss = 0.24398645
Iteration 69, loss = 0.24202910
Iteration 70, loss = 0.23572109
Iteration 71, loss = 0.23727039
Iteration 72, loss = 0.23337936
Iteration 73, loss = 0.23346875
Iteration 74, loss = 0.23899500
Iteration 75, loss = 0.23339510
Iteration 76, loss = 0.22764237
Iteration 77, loss = 0.22792822
Iteration 78, loss = 0.22632397
Iteration 79, loss = 0.22439054
Iteration 80, loss = 0.23286673
Iteration 81, loss = 0.22896344
Iteration 82, loss = 0.22125347
Iteration 83, loss = 0.23183654
Iteration 84, loss = 0.22694347
Iteration 85, loss = 0.21848633
Iteration 86, loss = 0.22199444
Iteration 87, loss = 0.21896390
Iteration 88, loss = 0.21821757
Iteration 89, loss = 0.21905151
Iteration 90, loss = 0.21624154
Iteration 91, loss = 0.21536296
Iteration 92, loss = 0.21462701
Iteration 93, loss = 0.21469123
Iteration 94, loss = 0.21447660
Iteration 95, loss = 0.21302403
Iteration 96, loss = 0.21229424
Iteration 97, loss = 0.21648274
Iteration 98, loss = 0.21629050
Iteration 99, loss = 0.21052837
Iteration 100, loss = 0.21421364
Iteration 101, loss = 0.21446421
Iteration 102, loss = 0.20951278
Iteration 103, loss = 0.20624030
Iteration 104, loss = 0.20192669
Iteration 105, loss = 0.21262530
Iteration 106, loss = 0.21059451
Iteration 107, loss = 0.21493775
Iteration 108, loss = 0.21247457
Iteration 109, loss = 0.20827266
Iteration 110, loss = 0.20797401
Iteration 111, loss = 0.20133312
Iteration 112, loss = 0.20384934
Iteration 113, loss = 0.20201144
Iteration 114, loss = 0.20074490
Iteration 115, loss = 0.20184496
Iteration 116, loss = 0.19870551
Iteration 117, loss = 0.19868175
Iteration 118, loss = 0.19710613
Iteration 119, loss = 0.20023168
Iteration 120, loss = 0.20204670
Iteration 121, loss = 0.19841706
Iteration 122, loss = 0.20835956
Iteration 123, loss = 0.20492477
Iteration 124, loss = 0.19585607
Iteration 125, loss = 0.19476145
Iteration 126, loss = 0.19599598
Iteration 127, loss = 0.19738285
Iteration 128, loss = 0.19462320
Iteration 129, loss = 0.19506457
Iteration 130, loss = 0.19915867
Iteration 131, loss = 0.19156462
Iteration 132, loss = 0.19539969
Iteration 133, loss = 0.19634843
Iteration 134, loss = 0.19588670
Iteration 135, loss = 0.20413956
Iteration 136, loss = 0.19834956
Iteration 137, loss = 0.19123766
Iteration 138, loss = 0.19483105
Iteration 139, loss = 0.19387671
Iteration 140, loss = 0.19259078
Iteration 141, loss = 0.19223833
Iteration 142, loss = 0.18983270
Iteration 143, loss = 0.18933797
Iteration 144, loss = 0.19138847
Iteration 145, loss = 0.19166472
Iteration 146, loss = 0.18864295
Iteration 147, loss = 0.18947084
Iteration 148, loss = 0.19269943
Iteration 149, loss = 0.18982741
Iteration 150, loss = 0.18771516
Iteration 151, loss = 0.18809080
Iteration 152, loss = 0.18868524
Iteration 153, loss = 0.18643025
Iteration 154, loss = 0.18793776
Iteration 155, loss = 0.18796948
Iteration 156, loss = 0.18672493
Iteration 157, loss = 0.18804210
Iteration 158, loss = 0.19320248
Iteration 159, loss = 0.18657739
Iteration 160, loss = 0.18376431
Iteration 161, loss = 0.18392296
Iteration 162, loss = 0.18839966
Iteration 163, loss = 0.18637709
Iteration 164, loss = 0.18340676
Iteration 165, loss = 0.18405448
Iteration 166, loss = 0.18144590
Iteration 167, loss = 0.18537356
Iteration 168, loss = 0.18235561
Iteration 169, loss = 0.18115791
Iteration 170, loss = 0.18049243
Iteration 171, loss = 0.19164347
Iteration 172, loss = 0.18817268
Iteration 173, loss = 0.18164340
Iteration 174, loss = 0.18231547
Iteration 175, loss = 0.18247379
Iteration 176, loss = 0.18297624
Iteration 177, loss = 0.18000340
Iteration 178, loss = 0.17792030
Iteration 179, loss = 0.17749439
Iteration 180, loss = 0.17690899
Iteration 181, loss = 0.17636246
Iteration 182, loss = 0.18374646
Iteration 183, loss = 0.17832486
Iteration 184, loss = 0.18365293
Iteration 185, loss = 0.17856427
Iteration 186, loss = 0.17180006
Iteration 187, loss = 0.17422329
Iteration 188, loss = 0.17642624
Iteration 189, loss = 0.18214418
Iteration 190, loss = 0.17378842
Iteration 191, loss = 0.17172324
Iteration 192, loss = 0.17496818
Iteration 193, loss = 0.17304721
Iteration 194, loss = 0.17511399
Iteration 195, loss = 0.17253638
Iteration 196, loss = 0.17902043
Iteration 197, loss = 0.17125700
Iteration 198, loss = 0.17631683
Iteration 199, loss = 0.17193380
Iteration 200, loss = 0.17034735
Iteration 201, loss = 0.17172892
Iteration 202, loss = 0.16704877
Iteration 203, loss = 0.16867013
Iteration 204, loss = 0.17092232
Iteration 205, loss = 0.16752118
Iteration 206, loss = 0.16417433
Iteration 207, loss = 0.16560926
Iteration 208, loss = 0.17149535
Iteration 209, loss = 0.17522727
Iteration 210, loss = 0.16526745
Iteration 211, loss = 0.16374203
Iteration 212, loss = 0.16300556
Iteration 213, loss = 0.16167173
Iteration 214, loss = 0.16248504
Iteration 215, loss = 0.16371142
Iteration 216, loss = 0.16203008
Iteration 217, loss = 0.16724511
Iteration 218, loss = 0.16135085
Iteration 219, loss = 0.17389288
Iteration 220, loss = 0.17035218
Iteration 221, loss = 0.17628162
Iteration 222, loss = 0.16054739
Iteration 223, loss = 0.16880219
Iteration 224, loss = 0.17565170
Iteration 225, loss = 0.16507134
Iteration 226, loss = 0.17451813
Iteration 227, loss = 0.17056154
Iteration 228, loss = 0.17301608
Iteration 229, loss = 0.17051196
Iteration 230, loss = 0.16668122
Iteration 231, loss = 0.16029997
Iteration 232, loss = 0.16629220
Iteration 233, loss = 0.16676703
Iteration 234, loss = 0.16736844
Iteration 235, loss = 0.17426554
Iteration 236, loss = 0.16744307
Iteration 237, loss = 0.16491206
Iteration 238, loss = 0.16478061
Iteration 239, loss = 0.15547292
Iteration 240, loss = 0.17035368
Iteration 241, loss = 0.16562166
Iteration 242, loss = 0.16326092
Iteration 243, loss = 0.15869094
Iteration 244, loss = 0.15821124
Iteration 245, loss = 0.15700092
Iteration 246, loss = 0.15862788
Iteration 247, loss = 0.15296610
Iteration 248, loss = 0.16287110
Iteration 249, loss = 0.15694248
Iteration 250, loss = 0.15857328
Iteration 251, loss = 0.15849296
Iteration 252, loss = 0.15297230
Iteration 253, loss = 0.15222974
Iteration 254, loss = 0.15356640
Iteration 255, loss = 0.15494507
Iteration 256, loss = 0.15569062
Iteration 257, loss = 0.15171184
Iteration 258, loss = 0.15147845
Iteration 259, loss = 0.15576388
Iteration 260, loss = 0.15798620
Iteration 261, loss = 0.17893863
Iteration 262, loss = 0.16553017
Iteration 263, loss = 0.15408385
Iteration 264, loss = 0.15127454
Iteration 265, loss = 0.15195088
Iteration 266, loss = 0.16409095
Iteration 267, loss = 0.15138270
Iteration 268, loss = 0.15846986
Iteration 269, loss = 0.15182922
Iteration 270, loss = 0.15035451
Iteration 271, loss = 0.14948154
Iteration 272, loss = 0.14941659
Iteration 273, loss = 0.15032517
Iteration 274, loss = 0.14903610
Iteration 275, loss = 0.15271764
Iteration 276, loss = 0.15103971
Iteration 277, loss = 0.15617733
Iteration 278, loss = 0.14783741
Iteration 279, loss = 0.15065138
Iteration 280, loss = 0.14969282
Iteration 281, loss = 0.14763248
Iteration 282, loss = 0.15189719
Iteration 283, loss = 0.15060460
Iteration 284, loss = 0.14922063
Iteration 285, loss = 0.14919481
Iteration 286, loss = 0.15226311
Iteration 287, loss = 0.15100809
Iteration 288, loss = 0.15163340
Iteration 289, loss = 0.15047148
Iteration 290, loss = 0.15687841
Iteration 291, loss = 0.14531137
Iteration 292, loss = 0.14673614
Iteration 293, loss = 0.14824440
Iteration 294, loss = 0.14748072
Iteration 295, loss = 0.14680663
Iteration 296, loss = 0.14665397
Iteration 297, loss = 0.14508099
Iteration 298, loss = 0.14577647
Iteration 299, loss = 0.14373049
Iteration 300, loss = 0.14457878
Iteration 301, loss = 0.14374740
Iteration 302, loss = 0.14498286
Iteration 303, loss = 0.14385636
Iteration 304, loss = 0.15194713
Iteration 305, loss = 0.14832022
Iteration 306, loss = 0.15762867
Iteration 307, loss = 0.14536009
Iteration 308, loss = 0.15134841
Iteration 309, loss = 0.14521824
Iteration 310, loss = 0.14210533
Iteration 311, loss = 0.15498353
Iteration 312, loss = 0.15618515
Iteration 313, loss = 0.14091871
Iteration 314, loss = 0.15249876
Iteration 315, loss = 0.14219143
Iteration 316, loss = 0.16047939
Iteration 317, loss = 0.14719331
Iteration 318, loss = 0.15271462
Iteration 319, loss = 0.14382279
Iteration 320, loss = 0.15290170
Iteration 321, loss = 0.14422318
Iteration 322, loss = 0.16657156
Iteration 323, loss = 0.15165789
Iteration 324, loss = 0.14354466
Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.
(0.9460093896713615, 0.9300699300699301)
len(clf.coefs_)
3
clf.coefs_[0].shape
(30, 10)
clf.coefs_[1].shape
(10, 20)
clf.coefs_[2].shape
(20, 1)
[i.shape for i in clf.coefs_]
[(30, 10), (10, 20), (20, 1)]
len(clf.intercepts_), clf.intercepts_[0].shape, clf.intercepts_[1].shape
(3, (10,), (20,))
len(clf.coefs_), clf.coefs_[0].shape, clf.coefs_[1].shape
(3, (30, 10), (10, 20))
clf.get_params()
{'activation': 'relu',
 'alpha': 0.0001,
 'batch_size': 'auto',
 'beta_1': 0.9,
 'beta_2': 0.999,
 'early_stopping': False,
 'epsilon': 1e-08,
 'hidden_layer_sizes': [10, 20],
 'learning_rate': 'constant',
 'learning_rate_init': 0.001,
 'max_fun': 15000,
 'max_iter': 1000,
 'momentum': 0.9,
 'n_iter_no_change': 10,
 'nesterovs_momentum': True,
 'power_t': 0.5,
 'random_state': 0,
 'shuffle': True,
 'solver': 'adam',
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': True,
 'warm_start': False}
clf.predict(X_train)[:6]
array([1, 1, 0, 1, 0, 1])

sklearn MLP Regressor#

from sklearn.datasets import fetch_california_housing

X = fetch_california_housing().data
y = fetch_california_housing().target

X_train, X_test, y_train, y_test =train_test_split(X,y, random_state=0)
X.shape
(20640, 8)
from sklearn.neural_network import MLPRegressor
reg = MLPRegressor(random_state=0, hidden_layer_sizes=[10,20], activation='relu', max_iter=1000, verbose=True)
reg.fit(X_train, y_train)
reg.score(X_train, y_train), reg.score(X_test, y_test)
Iteration 1, loss = 52.52882975
Iteration 2, loss = 1.32529916
Iteration 3, loss = 1.01549794
Iteration 4, loss = 0.93165872
Iteration 5, loss = 0.86093486
Iteration 6, loss = 0.79990589
Iteration 7, loss = 0.73403636
Iteration 8, loss = 0.69788625
Iteration 9, loss = 0.67321340
Iteration 10, loss = 0.65785799
Iteration 11, loss = 0.66388045
Iteration 12, loss = 0.64232088
Iteration 13, loss = 0.63543710
Iteration 14, loss = 0.62684449
Iteration 15, loss = 0.62159136
Iteration 16, loss = 0.62216359
Iteration 17, loss = 0.61531925
Iteration 18, loss = 0.61137555
Iteration 19, loss = 0.60772535
Iteration 20, loss = 0.60979898
Iteration 21, loss = 0.59862956
Iteration 22, loss = 0.59582040
Iteration 23, loss = 0.59304742
Iteration 24, loss = 0.59030457
Iteration 25, loss = 0.58986838
Iteration 26, loss = 0.58471890
Iteration 27, loss = 0.57564664
Iteration 28, loss = 0.56661063
Iteration 29, loss = 0.56876346
Iteration 30, loss = 0.55680836
Iteration 31, loss = 0.56158242
Iteration 32, loss = 0.54754586
Iteration 33, loss = 0.54534137
Iteration 34, loss = 0.54411343
Iteration 35, loss = 0.54520658
Iteration 36, loss = 0.53282634
Iteration 37, loss = 0.52803730
Iteration 38, loss = 0.52438899
Iteration 39, loss = 0.50863964
Iteration 40, loss = 0.50332306
Iteration 41, loss = 0.50067107
Iteration 42, loss = 0.47752650
Iteration 43, loss = 0.46421296
Iteration 44, loss = 0.47060687
Iteration 45, loss = 0.43640046
Iteration 46, loss = 0.43007907
Iteration 47, loss = 0.42065256
Iteration 48, loss = 0.41587375
Iteration 49, loss = 0.41198637
Iteration 50, loss = 0.37501025
Iteration 51, loss = 0.37283979
Iteration 52, loss = 0.36538064
Iteration 53, loss = 0.36057374
Iteration 54, loss = 0.35277758
Iteration 55, loss = 0.34556601
Iteration 56, loss = 0.35069067
Iteration 57, loss = 0.37444898
Iteration 58, loss = 0.37571771
Iteration 59, loss = 0.34105057
Iteration 60, loss = 0.33122801
Iteration 61, loss = 0.35431500
Iteration 62, loss = 0.34351127
Iteration 63, loss = 0.32956626
Iteration 64, loss = 0.32241299
Iteration 65, loss = 0.34178269
Iteration 66, loss = 0.31293900
Iteration 67, loss = 0.31751505
Iteration 68, loss = 0.34006699
Iteration 69, loss = 0.31476664
Iteration 70, loss = 0.34187511
Iteration 71, loss = 0.31426102
Iteration 72, loss = 0.31473451
Iteration 73, loss = 0.31176530
Iteration 74, loss = 0.33025574
Iteration 75, loss = 0.31925191
Iteration 76, loss = 0.31956479
Iteration 77, loss = 0.31558307
Iteration 78, loss = 0.31092478
Iteration 79, loss = 0.33071734
Iteration 80, loss = 0.32227979
Iteration 81, loss = 0.31190588
Iteration 82, loss = 0.30057136
Iteration 83, loss = 0.33745788
Iteration 84, loss = 0.30122896
Iteration 85, loss = 0.31212367
Iteration 86, loss = 0.30589276
Iteration 87, loss = 0.32462912
Iteration 88, loss = 0.30497465
Iteration 89, loss = 0.32525405
Iteration 90, loss = 0.29860242
Iteration 91, loss = 0.31393978
Iteration 92, loss = 0.32766030
Iteration 93, loss = 0.32675385
Iteration 94, loss = 0.37417123
Iteration 95, loss = 0.32430026
Iteration 96, loss = 0.29202709
Iteration 97, loss = 0.29814625
Iteration 98, loss = 0.30886402
Iteration 99, loss = 0.31729025
Iteration 100, loss = 0.32048075
Iteration 101, loss = 0.29301587
Iteration 102, loss = 0.29507638
Iteration 103, loss = 0.30495718
Iteration 104, loss = 0.29280272
Iteration 105, loss = 0.30532408
Iteration 106, loss = 0.33119015
Iteration 107, loss = 0.28437480
Iteration 108, loss = 0.31028314
Iteration 109, loss = 0.37314399
Iteration 110, loss = 0.35828977
Iteration 111, loss = 0.28852065
Iteration 112, loss = 0.29135194
Iteration 113, loss = 0.28764178
Iteration 114, loss = 0.29167810
Iteration 115, loss = 0.28611649
Iteration 116, loss = 0.32205754
Iteration 117, loss = 0.29220880
Iteration 118, loss = 0.29557400
Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.
(0.5686693538646728, 0.553300884080113)
len(reg.coefs_)
3
reg.coefs_[0].shape
(8, 10)
reg.coefs_[1].shape
(10, 20)
reg.coefs_[2].shape
(20, 1)
[i.shape for i in reg.coefs_]
[(8, 10), (10, 20), (20, 1)]
len(reg.intercepts_), reg.intercepts_[0].shape, reg.intercepts_[1].shape
(3, (10,), (20,))
len(reg.coefs_), reg.coefs_[0].shape, reg.coefs_[1].shape
(3, (8, 10), (10, 20))
reg.get_params()
{'activation': 'relu',
 'alpha': 0.0001,
 'batch_size': 'auto',
 'beta_1': 0.9,
 'beta_2': 0.999,
 'early_stopping': False,
 'epsilon': 1e-08,
 'hidden_layer_sizes': [10, 20],
 'learning_rate': 'constant',
 'learning_rate_init': 0.001,
 'max_fun': 15000,
 'max_iter': 1000,
 'momentum': 0.9,
 'n_iter_no_change': 10,
 'nesterovs_momentum': True,
 'power_t': 0.5,
 'random_state': 0,
 'shuffle': True,
 'solver': 'adam',
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': True,
 'warm_start': False}
reg.predict(X_train)[:6]
array([3.03326133, 2.90979905, 1.29352856, 2.88241277, 1.2781484 ,
       3.08224869])

Tensorflow#

TensorFlow (TF) is a free, open-source library for machine learning created by Google.

  • It supports both neural networks and traditional machine learning algorithms.

  • Initially designed for executing large-scale numerical computations.

  • Data is represented as tensors, which are multi-dimensional arrays.

  • Computations are performed as data flow graphs.

  • TensorFlow offers a robust and versatile ecosystem of tools, libraries, and community resources, enabling researchers to advance machine learning innovations and allowing developers to build and deploy machine learning-powered applications with ease.

Some Examples: TensorFlow can train and run deep neural networks for various tasks, such as

  • image classification

  • natural language processing

  • object detection.

  • handwritten digit classification,

  • sequence-to-sequence models for machine translation,

  • natural language processing,

Keras#

Keras is a high-level neural network library that runs on top of Theano or TensorFlow, offering a user-friendly API similar to scikit-learn for constructing neural networks in Python.

  • It enables developers to quickly build, train, evaluate, and deploy models without needing to delve into the complexities of tensor algebra, numerical techniques, or optimization methods.

  • Developers can leverage Keras to rapidly construct neural networks without needing to concern themselves with the underlying mathematics of tensor algebra, numerical methods, or optimization processes.

  • Keras can utilize TensorFlow or Theano as its computational backend, and TensorFlow now includes its own integrated version, known as tf.keras.

  • Additionally, Facebook’s PyTorch library has gained significant popularity, offering functionality similar to Keras.

import tensorflow as tf
from tensorflow import keras

Binary classification#

X, y = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)
# breast cancer dataset: output binary(0,1)
model = keras.models.Sequential([
keras.layers.Input( shape= (30,) ), # input layer
keras.layers.Dense( 10, activation='relu' ),     # hiddenlayer-I
keras.layers.Dense( 20, activation='relu' ),     # second hidden layer
keras.layers.Dense( 1, activation='sigmoid' ),   # output layer
])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 10)             │           310 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 20)             │           220 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1)              │            21 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 551 (2.15 KB)
 Trainable params: 551 (2.15 KB)
 Non-trainable params: 0 (0.00 B)
model = keras.models.Sequential([
keras.layers.Input(shape=X_train.shape[1:]),
keras.layers.Dense(40, activation='relu',  name='first_dense'),
keras.layers.Dense( 1, activation='sigmoid')  ])

model.compile(loss="binary_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(X_train, y_train, epochs=30, batch_size=32, verbose=0)

model.evaluate(X_test, y_test)
1/5 ━━━━━━━━━━━━━━━━━━━━ 0s 29ms/step - accuracy: 0.6562 - loss: 0.6550

5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 601us/step - accuracy: 0.6208 - loss: 0.6655
[0.6602327227592468, 0.6293706297874451]

Regression#

from sklearn.datasets import fetch_california_housing
X,y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)
model = keras.models.Sequential([
keras.layers.Input(shape=X_train.shape[1:]),
keras.layers.Dense(50, activation='relu', name='first_dense'),
keras.layers.Dense(1)  ])

model.compile(loss="mse", optimizer="adam")
model.fit(X_train, y_train, epochs=30, batch_size=32, verbose=0)

model.evaluate(X_test, y_test)
  1/162 ━━━━━━━━━━━━━━━━━━━━ 3s 20ms/step - loss: 0.5423

162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 176us/step - loss: 1.3533
1.1719304323196411
from sklearn.metrics import r2_score
y_test_predict = model.predict(X_test)
r2_score(y_test_predict , y_test)
  1/162 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step

162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 202us/step
0.07795921851665988
y_train_predict = model.predict(X_train)
r2_score(y_train_predict , y_train)
  1/484 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step

318/484 ━━━━━━━━━━━━━━━━━━━━ 0s 158us/step

484/484 ━━━━━━━━━━━━━━━━━━━━ 0s 158us/step
0.12894084238469106

Multiclass#

fashion_mnist = keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

X_train.shape, y_train.shape
((60000, 28, 28), (60000,))
class_labels = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
import matplotlib.pyplot as plt
plt.imshow(X_train[0], 'gray');
_images/12c6d1aa26525faf4b19a167d7cd68011c1b0c9403e1627056065c16351bc7de.png
plt.imshow(X_train[1], 'gray');
_images/b12384b949b7792ab5baa3284783dd9158321d19e685cb889b5293bfda23e2b7.png
import random
N = 5
M = 20
plt.figure(figsize=(20,10))
for i in range(1,N*M+1):
    plt.subplot(N,M,i)
    ind  = random.randint(1,X_train.shape[0])
    plt.imshow(X_train[ind], 'gray')
    plt.title(class_labels[y_train[ind]])
    plt.axis('off');
_images/e3ed701172d126bb5f1b1d02cbbbce40109c0a1435b12e6fb11e85ebbe63af2d.png
from collections import Counter
Counter(y_train)
Counter({9: 6000,
         0: 6000,
         3: 6000,
         2: 6000,
         7: 6000,
         5: 6000,
         1: 6000,
         6: 6000,
         4: 6000,
         8: 6000})
Counter(y_test)
Counter({9: 1000,
         2: 1000,
         1: 1000,
         6: 1000,
         4: 1000,
         5: 1000,
         7: 1000,
         3: 1000,
         8: 1000,
         0: 1000})
# whole process
import numpy as np
model = keras.models.Sequential([
   keras.layers.Input( shape=[28,28]), 
   keras.layers.Flatten(),
   keras.layers.Dense( 300, activation='relu'),
   keras.layers.Dense( 50, activation='relu'),
   keras.layers.Dense( 10, activation='softmax') ])

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=5, verbose=0)
<keras.src.callbacks.history.History at 0x2b2c9d2d0>
x = X_test[4].reshape(1,28,28)
print(np.argmax(model.predict(x), axis=-1))
model.predict(x)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step
[6]

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step
array([[1.4274293e-01, 6.3669607e-03, 2.2221352e-01, 3.1836677e-02,
        2.4732779e-01, 2.3197448e-03, 3.3471039e-01, 7.6046708e-05,
        1.1841005e-02, 5.6493259e-04]], dtype=float32)
y_test_pred = [np.argmax(i) for i in model.predict(X_test)]
  1/313 ━━━━━━━━━━━━━━━━━━━━ 4s 14ms/step

173/313 ━━━━━━━━━━━━━━━━━━━━ 0s 292us/step

313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 287us/step
from sklearn.metrics import accuracy_score
accuracy_score(y_test_pred, y_test)
0.7268