App: Statistics Module

App: Statistics Module#

Section Title: Statistics

The statistics module provides most commonly used statistical functions.

First, you need to import the statistics module.

import statistics

You can access the list of methods in the statistics module by using the dir(statistics) function.

You can execute help(statistics) for more details.

print(dir(statistics))

['Counter', 'Decimal', 'Fraction', 'LinearRegression', 'NormalDist', 'StatisticsError', '_SQRT2', '__all__', '__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_coerce', '_convert', '_decimal_sqrt_of_frac', '_exact_ratio', '_fail_neg', '_float_sqrt_of_frac', '_integer_sqrt_of_frac_rto', '_isfinite', '_mean_stdev', '_normal_dist_inv_cdf', '_sqrt_bit_width', '_ss', '_sum', 'bisect_left', 'bisect_right', 'correlation', 'covariance', 'defaultdict', 'erf', 'exp', 'fabs', 'fmean', 'fsum', 'geometric_mean', 'groupby', 'harmonic_mean', 'hypot', 'linear_regression', 'log', 'math', 'mean', 'median', 'median_grouped', 'median_high', 'median_low', 'mode', 'mul', 'multimode', 'namedtuple', 'numbers', 'pstdev', 'pvariance', 'quantiles', 'random', 'reduce', 'repeat', 'sqrt', 'stdev', 'sys', 'tau', 'variance']

Methods#

mean()#

It returns the mean (average) of the provided data.

data = [1,2,3,3,4,4,4,5,5,6]
print(statistics.mean(data))

3.7

median()#

It returns the median of the provided data.

data = [1,2,3,3,4,4,4,5,5,6]
print(statistics.median(data))

4.0

mode()#

It returns the mode (most frequent element) of the provided data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.mode(data))

pstdev()#

It returns the population standard deviation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.pstdev(data))

1.42828568570857

stdev()#

It returns the sample standard deviation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.stdev(data))

1.505545305418162

pvariance()#

It returns the population variation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.pvariance(data))

2.04

variance()#

It returns the sample variation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.variance(data))

2.2666666666666666

quantiles()#

It returns three points that divide the data into four intervals, each containing approximately 25 percent of the data.

The parameter n specifies the number of intervals, with a default value of 4.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.quantiles(data))

[2.75, 3.5, 5.0]

data = [1,2,3,4,5,6,7,8,9,10]
print(statistics.quantiles(data))

[2.75, 5.5, 8.25]

correlation()#

It returns the Pearson’s correlation coefficient of two given data.

By default, it computes Pearson’s correlation coefficient, a value ranging between -1 and +1, indicating the strength and direction of a linear relationship between variables.

data1 = [1,2,3,3,3,4,4,5,5,6]
data2 = [4,3,3,2,2,2,3,3,3,1]
print('Pearson’s correlation       coefficient:', statistics.correlation(data1, data2))

Pearson’s correlation       coefficient: -0.5776155346615539

linear_regression()#

It returns and returns the slope and intercept of the linear regression model, representing the closest line to the scatter plot of the data.

x = [1,2,3,4,5,6,7,8,9,10]
y = [1,3,2,5,6,9,8,7,10,9]

model = statistics.linear_regression(x, y)
slope = model.slope
intercept = model.intercept

import matplotlib.pyplot as plt
plt.scatter(x,y, label='actual points')

x1, y1 = 1, 1*slope+intercept
x2, y2 = 10, 10*slope+intercept

plt.plot([x1, x2], [y1, y2], color='r', label='linear model')
plt.legend();

_images/568175645df25571222cc7a592bc7944a8373802cafe50be1d91f55bf3ac5118.png

sqrt()#

It is the square root function. The syntax is statistics.sqrt()

statistics.sqrt(36)

6.0

log()#

It returns the logarithm of the given value to the specified base, which defaults to \(e\).

# log base e
statistics.log(1)

0.0

# log base 10
statistics.log(1000, 10)

2.9999999999999996

# log base 2
statistics.log(32, 2)

5.0