App: Statistics Module#

Section Title: Statistics

The statistics module provides most commonly used statistical functions.

  • First, you need to import the statistics module.

import statistics

You can access the list of methods in the statistics module by using the dir(statistics) function.

  • You can execute help(statistics) for more details.

print(dir(statistics))
['Counter', 'Decimal', 'Fraction', 'LinearRegression', 'NormalDist', 'StatisticsError', '_SQRT2', '__all__', '__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_coerce', '_convert', '_decimal_sqrt_of_frac', '_exact_ratio', '_fail_neg', '_float_sqrt_of_frac', '_integer_sqrt_of_frac_rto', '_isfinite', '_mean_stdev', '_normal_dist_inv_cdf', '_sqrt_bit_width', '_ss', '_sum', 'bisect_left', 'bisect_right', 'correlation', 'covariance', 'defaultdict', 'erf', 'exp', 'fabs', 'fmean', 'fsum', 'geometric_mean', 'groupby', 'harmonic_mean', 'hypot', 'linear_regression', 'log', 'math', 'mean', 'median', 'median_grouped', 'median_high', 'median_low', 'mode', 'mul', 'multimode', 'namedtuple', 'numbers', 'pstdev', 'pvariance', 'quantiles', 'random', 'reduce', 'repeat', 'sqrt', 'stdev', 'sys', 'tau', 'variance']

Methods#

mean()#

It returns the mean (average) of the provided data.

data = [1,2,3,3,4,4,4,5,5,6]
print(statistics.mean(data))
3.7

median()#

It returns the median of the provided data.

data = [1,2,3,3,4,4,4,5,5,6]
print(statistics.median(data))
4.0

mode()#

It returns the mode (most frequent element) of the provided data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.mode(data))
3

pstdev()#

It returns the population standard deviation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.pstdev(data))
1.42828568570857

stdev()#

It returns the sample standard deviation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.stdev(data))
1.505545305418162

pvariance()#

It returns the population variation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.pvariance(data))
2.04

variance()#

It returns the sample variation of the given data.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.variance(data))
2.2666666666666666

quantiles()#

It returns three points that divide the data into four intervals, each containing approximately 25 percent of the data.

  • The parameter n specifies the number of intervals, with a default value of 4.

data = [1,2,3,3,3,4,4,5,5,6]
print(statistics.quantiles(data))
[2.75, 3.5, 5.0]
data = [1,2,3,4,5,6,7,8,9,10]
print(statistics.quantiles(data))
[2.75, 5.5, 8.25]

correlation()#

It returns the Pearson’s correlation coefficient of two given data.

  • By default, it computes Pearson’s correlation coefficient, a value ranging between -1 and +1, indicating the strength and direction of a linear relationship between variables.

data1 = [1,2,3,3,3,4,4,5,5,6]
data2 = [4,3,3,2,2,2,3,3,3,1]
print('Pearson’s correlation       coefficient:', statistics.correlation(data1, data2))
Pearson’s correlation       coefficient: -0.5776155346615539

linear_regression()#

It returns and returns the slope and intercept of the linear regression model, representing the closest line to the scatter plot of the data.

x = [1,2,3,4,5,6,7,8,9,10]
y = [1,3,2,5,6,9,8,7,10,9]

model = statistics.linear_regression(x, y)
slope = model.slope
intercept = model.intercept
import matplotlib.pyplot as plt
plt.scatter(x,y, label='actual points')

x1, y1 = 1, 1*slope+intercept
x2, y2 = 10, 10*slope+intercept

plt.plot([x1, x2], [y1, y2], color='r', label='linear model')
plt.legend();
_images/568175645df25571222cc7a592bc7944a8373802cafe50be1d91f55bf3ac5118.png

sqrt()#

It is the square root function. The syntax is statistics.sqrt()

statistics.sqrt(36)
6.0

log()#

It returns the logarithm of the given value to the specified base, which defaults to \(e\).

# log base e
statistics.log(1)
0.0
# log base 10
statistics.log(1000, 10)
2.9999999999999996
# log base 2
statistics.log(32, 2)
5.0