Logistic Regression#

Despite its name, Logistic Regression is used for classification tasks.

  • It is conceptually similar to linear regression.

  • The model calculates the probabilities of each class, and the predicted class is determined by the one with the highest probability.

Theory#

  • Sigmoid function \(\displaystyle \sigma(z) = \frac{1}{1+e^{-z}}\)

    • Its range is between 0 and 1.

import math

def sigmoid(z):
    return 1/(1+math.e**(-z))

sigmoid(0)
0.5
import matplotlib.pyplot as plt
import numpy as np
x_values = np.linspace(-10, 10, 1000)
y_values = sigmoid(x_values)

plt.plot(x_values, y_values)
plt.vlines(0, 0, 1, color='red')
plt.hlines(0, -10, 10, color='red')
plt.plot(x_values, y_values, c='navy');
_images/c38744c3d5d333be341c7dc15a8ab122e45ff305f525cf82a33766060e3edc97.png
  • Assume there are \(n\) features, represented by the variables: \( x_1, x_2, \dots, x_n \).

  • The logistic regression algorithm uses the following function to make predictions:

    • \(\hat{p} = \sigma(w_1x_1+w_2x_2+w_3x_3+...+w_nx_n+b)\)

    • \( \hat{y} = \begin{cases} 1 & \hat{p}>0.5 \\ 0 & \hat{p}\le 0.5 \\ \end{cases}\)

where

  • \(\hat{p}\) is the predicted probability value

  • \(\hat{y}\) is the predicted class value

  • \(w_1, w_2,..., w_n\) are coefficients(weights)

  • \(b\) is he intercept (bias).

The logistic regression algorithm determines the coefficients and the intercept by fitting the model to the training set.

  • Since \(\hat{p}\) falls between 0 and 1, it represents the probability of the sample belonging to each class.

Coding#

from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
from sklearn.linear_model import LogisticRegression
logistic_reg = LogisticRegression(max_iter=2000)
logistic_reg.fit(X_train, y_train)
/Users/yusufdanisman/anaconda3/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
LogisticRegression(max_iter=2000)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
logistic_reg.score(X_train, y_train)
0.960093896713615
logistic_reg.score(X_test, y_test)
0.951048951048951
logistic_reg.predict(X_test[:1])
array([0])
logistic_reg.predict_proba(X_test[:1])
array([[0.99517313, 0.00482687]])
logistic_reg.coef_
array([[ 0.66682319,  0.16318559, -0.15398642,  0.02180548, -0.15122598,
        -0.17307583, -0.35844771, -0.21710126, -0.30766858, -0.02684995,
        -0.03265615,  1.02404177, -0.02402872, -0.10833917, -0.01314999,
         0.07205555, -0.00398048, -0.02353936, -0.03121466,  0.01682862,
         0.27634342, -0.39166997, -0.22304858, -0.01042741, -0.26548139,
        -0.58408071, -1.15782012, -0.4636791 , -0.66474681, -0.08323585]])
logistic_reg.coef_.shape
(1, 30)
logistic_reg.intercept_
array([29.21680268])